If you’re using a laptop, start
               installing latex, from the
               instructions on the website



Thursday, 2 September 2010
Stat405              Statistical reports


                               Hadley Wickham
Thursday, 2 September 2010
1. More subsetting.
               2. Missing values.
               3. Statistical reports: data, code,
                  graphics & written report




Thursday, 2 September 2010
Office hours
               Me: before class, DH 2056
               Garrett: Wednesday, 3pm, DH 1041

               Lab access: you should now have it




Thursday, 2 September 2010
Saving results
               # Prints to screen
               diamonds[diamonds$x > 10, ]

               # Saves to new data frame
               big <- diamonds[diamonds$x > 10, ]

               # Overwrites existing data frame. Dangerous!
               diamonds <- diamonds[diamonds$x < 10,]



Thursday, 2 September 2010
diamonds <- diamonds[1, 1]
     diamonds

     # Uh oh!

     rm(diamonds)
     str(diamonds)

     # Phew!




Thursday, 2 September 2010
Your turn
                    Create a logical vector that selects
                    diamonds with equal x & y. Create a new
                    dataset that only contains these values.
                    Create a logical vector that selects
                    diamonds with incorrect/unusual x, y, or z
                    values. Create a new dataset that omits
                    these values. (Hint: do this one variable
                    at a time)


Thursday, 2 September 2010
equal_dim <- diamonds$x == diamonds$y
     equal <- diamonds[equal_dim, ]

     y_big <- diamonds$y > 10
     z_big <- diamonds$z > 6

     x_zero <- diamonds$x == 0
     y_zero <- diamonds$y == 0
     z_zero <- diamonds$z == 0
     zeros <- x_zero | y_zero | z_zero

     bad <- y_big | z_big | zeros
     good <- diamonds[!bad, ]


Thursday, 2 September 2010
Missing
                             values
Thursday, 2 September 2010
Data errors

                    Typically removing the entire row because
                    of one error is overkill. Better to
                    selectively replace problem values with
                    missing values.
                    In R, missing values are indicated by NA




Thursday, 2 September 2010
Expression      Guess   Actual
                          5 + NA
                              NA / 2
                   sum(c(5, NA))
                   mean(c(5, NA)
                              NA < 3
                             NA == 3
                             NA == NA

Thursday, 2 September 2010
NA behaviour

                    Missing values propagate
                    Use is.na() to check for missing values
                    Many functions (e.g. sum and mean) have
                    na.rm argument to remove missing values
                    prior to computation.



Thursday, 2 September 2010
# Can use subsetting + <- to change individual
     # values

     diamonds$x[diamonds$x == 0] <- NA
     diamonds$y[diamonds$y == 0] <- NA
     diamonds$z[diamonds$z == 0] <- NA

     y_big <- !is.na(diamonds$y) & diamonds$y   > 10
     diamonds$y[y_big] <- diamonds$y[y_big] /   10
     z_big <- !is.na(diamonds$z) & diamonds$z   > 6
     diamonds$z[z_big] <- diamonds$z[z_big] /   10




Thursday, 2 September 2010
Your turn


                    What happens if you don’t remove the
                    missing values during the subsetting
                    replacement? Why?




Thursday, 2 September 2010
Statistical
                      reports
Thursday, 2 September 2010
Statistical reports

                    Regardless of whether you go into academia
                    or industry, you need to be able to present
                    your findings.
                    And you should be able to do more than just
                    present them, you should be able to
                    reproduce them.




Thursday, 2 September 2010
In
                             Data (.csv)




                                          on
                                              e
                                              di
                                               re
                                  +




                                                  ct
                                                   ryo
                             Code (.r)
                                  +
                        Graphics (.png, .pdf)
                                  +
                        Written report (.tex)
Thursday, 2 September 2010
Working directory
                    Set your working directory to specify where
                    files will be loaded from and saved to.
                    From the terminal (linux or mac): the
                    working directory is the directory you’re in
                    when you start R
                    On windows: File | Change dir.
                    On the mac: ⌘-D


Thursday, 2 September 2010
Data
              So far we’ve just used built in datasets
           Next week we’ll learn how to use external data



Thursday, 2 September 2010
Code

Thursday, 2 September 2010
Workflow
                    At the end of each interactive session, you
                    want a summary of everything you did
                    Two options:
                             Save everything that you did with
                             savehistory(filename.r) then remove the
                             unimportant bits
                             Build up the important bits as you go
                    Up to you - I prefer the second

Thursday, 2 September 2010
R editor

                    Linux: gedit
                    (copy and paste - see website)

                    Windows: File | New Script
                    (press F5 to send line)

                    Mac: File | New document
                    (press command-enter to send)




Thursday, 2 September 2010
Code is
                             communication!


Thursday, 2 September 2010
Code presentation
                    Use comments (#) to describe what you are
                    doing and to create scannable headings in
                    your code
                    Every comma should be followed by a space,
                    and every mathematical operator (+, -, =, *, /
                    etc) should be surrounded by spaces.
                    Parentheses do not need spaces
                    Lines should be at most 80 characters. If you
                    have to break up a line, indent the following
                    piece
Thursday, 2 September 2010
qplot(table,depth,data=diamonds)
                   qplot(table,depth,data=diamonds)+xlim
                   (50,70)+ylim(50,70)
                   qplot(table-depth,data=diamonds,geom="histo
                   gram")
                   qplot(table/depth,data=diamonds,geom="histo
                   gram",binwidth=0.01)+xlim(0.8,1.2)




Thursday, 2 September 2010
# Table and depth -------------------------

                  qplot(table, depth, data = diamonds)
                  qplot(table, depth, data = diamonds) +
                    xlim(50, 70) + ylim(50, 70)

                  # Is there a linear relationship?
                  qplot(table - depth, data = diamonds,
                    geom = "histogram")

                  # This bin width seems the most revealing
                  qplot(table / depth, data = diamonds,
                    geom = "histogram", binwidth = 0.01) +
                    xlim(0.8, 1.2)
                  # Also tried: 0.05, 0.005, 0.002


Thursday, 2 September 2010
# Table and depth -------------------------

                  qplot(table, depth, data = diamonds)
                  qplot(table, depth, data = diamonds) +
                    xlim(50, 70) + ylim(50, 70)

                  # Is there a linear relationship?
                  qplot(table - depth, data = diamonds,
                    geom = "histogram")

                  # This bin width seems the most revealing
                  qplot(table / depth, data = diamonds,
                    geom = "histogram", binwidth = 0.01) +
                    xlim(0.8, 1.2)
                  # Also tried: 0.05, 0.005, 0.002


Thursday, 2 September 2010
Graphics

Thursday, 2 September 2010
Saving graphics
                     # Uses size on screen:
                     ggsave("my-plot.pdf")
                     ggsave("my-plot.png")

                     # Specify size
                     ggsave("my-plot.pdf",
                       width = 6, height = 6)

                     # Remember to set your working
                     # directory!

Thursday, 2 September 2010
PDF                  PNG

                         Vector based        Raster based
                 (can zoom in infinitely)    (made up of pixels)


                                            Good for plots
                       Good for most
                                           with thousands of
                          plots
                                                 points


Thursday, 2 September 2010
Your turn

                    Recreate some of the graphics from
                    previous lectures and save them.
                    Experiment with the scale and height and
                    width settings.
                    Modify the template to include them.



Thursday, 2 September 2010
Written
                             report
Thursday, 2 September 2010
Latex
                    We are going to use the open source
                    document typesetting system called latex to
                    produce our reports.
                    This is widespread in statistics - if you ever
                    write a journal article, you will probably write
                    it in latex.
                    (Not as useful if you’re not in grad school,
                    but still an important skill)


Thursday, 2 September 2010
Edit-Compile-Preview
                    Edit: a text document with special
                    formatting
                    Compile: to produce a pdf
                    Preview: with a pdf viewer


                    See web page for system specifics.


Thursday, 2 September 2010
Latex
                    Template
                    Sections
                    Images
                    Figures and cross-references
                    Verbatim input (for code)



Thursday, 2 September 2010
Your turn
                    # Get the sample report
                    wget http://had.co.nz/stat405/
                    resources/sample-report.zip
                    unzip sample-report.zip

                    cd sample-report
                    gedit template.tex &
                    pdflatex template.tex
                    evince template.pdf
                    # Experiment!


Thursday, 2 September 2010
Your turn

                    If not on linux, follow the instructions on
                    the class website.
                    If you feel comfortable, start on
                    homework 2.




Thursday, 2 September 2010
Homework



Thursday, 2 September 2010

More Related Content

PDF
04 reports
PDF
Sql tutorial
PPTX
Sqlserver 2008 r2
ODP
Python Day1
PDF
Apache Cassandra - Data modelling
PPTX
PDF
Hack reduce mr-intro
PPT
MySQL Built-In Functions
 
04 reports
Sql tutorial
Sqlserver 2008 r2
Python Day1
Apache Cassandra - Data modelling
Hack reduce mr-intro
MySQL Built-In Functions
 

Viewers also liked (9)

PDF
01 Introduction
PDF
14 Bivariate Transformations
PDF
PDF
08 Continuous
PDF
09 Simulation
PDF
18 Normal Cont
PDF
PDF
15 Bivariate Change Of Variables
PDF
07 Discrete
01 Introduction
14 Bivariate Transformations
08 Continuous
09 Simulation
18 Normal Cont
15 Bivariate Change Of Variables
07 Discrete
Ad

Similar to 04 Reports (20)

PDF
05 subsetting
PDF
PDF
07 problem-solving
PDF
08 functions
PDF
10 simulation
PDF
10 simulation
PDF
PDF
09 Data
PDF
Riak Intro
PDF
PDF
noSQL @ QCon SP
PDF
Seaside - Why should you care? (Dynamic Stockholm 2010)
PDF
R packages
PDF
Macruby - RubyConf Presentation 2010
PDF
Tool Time
PDF
Python Puzzlers
PDF
Integrating php withrabbitmq_zendcon
PDF
Innodb plugin in MySQL 5.1
PDF
08 Functions
PDF
03 Cleaning
05 subsetting
07 problem-solving
08 functions
10 simulation
10 simulation
09 Data
Riak Intro
noSQL @ QCon SP
Seaside - Why should you care? (Dynamic Stockholm 2010)
R packages
Macruby - RubyConf Presentation 2010
Tool Time
Python Puzzlers
Integrating php withrabbitmq_zendcon
Innodb plugin in MySQL 5.1
08 Functions
03 Cleaning
Ad

More from Hadley Wickham (20)

PDF
27 development
PDF
27 development
PDF
24 modelling
PDF
23 data-structures
PDF
Graphical inference
PDF
PDF
20 date-times
PDF
19 tables
PDF
18 cleaning
PDF
17 polishing
PDF
16 critique
PDF
15 time-space
PDF
14 case-study
PDF
13 case-study
PDF
12 adv-manip
PDF
11 adv-manip
PDF
11 adv-manip
PDF
09 bootstrapping
PDF
03 extensions
PDF
02 large
27 development
27 development
24 modelling
23 data-structures
Graphical inference
20 date-times
19 tables
18 cleaning
17 polishing
16 critique
15 time-space
14 case-study
13 case-study
12 adv-manip
11 adv-manip
11 adv-manip
09 bootstrapping
03 extensions
02 large

Recently uploaded (20)

DOCX
FIFA World Cup Tickets: Messi included in Argentina squad for FIFA 2026 CONME...
DOCX
Ancelotti Backs Fresh Brazil Squad for World Cup Qualifiers.docx
DOCX
World Cup Lawrence to Boost Hotel Tax for FIFA 2026 Tourism Surge.docx
DOCX
NFL Dublin Minnesota Vikings Bolster Backfield with New Running Back.docx
PDF
CH-5 (1).pdfjsjdjjdjdjdjdjdjdjdjdjdjdjdndjdn
DOCX
FIFA World Cup Tickets World Cup Draw Date and Venue Officially Confirmed.docx
PDF
World Cup Messi Leads Argentina toward FIFA 2026 Glory.pdf
DOCX
FIFA World Cup 2026 Tickets: Final draw for FIFA 2026 set for December at Ken...
DOCX
Gianni Infantino Highlights Respect and Fair Play in Global Football.docx
PPTX
Football_Presentation.pptx football concepts
DOCX
The Heart of FIFA World Cup 2026 Volunteers.docx
PPTX
VOLLEYBALLLLLLLLLLLLLLLLLLLLLLLLLLL.pptx
PDF
Yhuggttyfftuuiyftuhgyyugftyyftuuhfrthhueey88rt
DOCX
Mexico Strengthens Security Measures to Ensure Safe FIFA 2026.docx
DOCX
Volunteer at World Cup in USA & Canada.docx
PDF
Uehehehrhehehejrjjrjrjrjrjdjrjjrjrjrjrjrjrjr
PDF
How a Father and Angel City STH Considers Sports Fandom
PPTX
VSSUT_NBA_Session_MBA-ncr ghaziabaduttar
PDF
The History of the Olympic Games: From Ancient Greece to Modern Times
PPTX
PHYSICAL EDUCATION AND HEALTH LESSON.pptx
FIFA World Cup Tickets: Messi included in Argentina squad for FIFA 2026 CONME...
Ancelotti Backs Fresh Brazil Squad for World Cup Qualifiers.docx
World Cup Lawrence to Boost Hotel Tax for FIFA 2026 Tourism Surge.docx
NFL Dublin Minnesota Vikings Bolster Backfield with New Running Back.docx
CH-5 (1).pdfjsjdjjdjdjdjdjdjdjdjdjdjdjdndjdn
FIFA World Cup Tickets World Cup Draw Date and Venue Officially Confirmed.docx
World Cup Messi Leads Argentina toward FIFA 2026 Glory.pdf
FIFA World Cup 2026 Tickets: Final draw for FIFA 2026 set for December at Ken...
Gianni Infantino Highlights Respect and Fair Play in Global Football.docx
Football_Presentation.pptx football concepts
The Heart of FIFA World Cup 2026 Volunteers.docx
VOLLEYBALLLLLLLLLLLLLLLLLLLLLLLLLLL.pptx
Yhuggttyfftuuiyftuhgyyugftyyftuuhfrthhueey88rt
Mexico Strengthens Security Measures to Ensure Safe FIFA 2026.docx
Volunteer at World Cup in USA & Canada.docx
Uehehehrhehehejrjjrjrjrjrjdjrjjrjrjrjrjrjrjr
How a Father and Angel City STH Considers Sports Fandom
VSSUT_NBA_Session_MBA-ncr ghaziabaduttar
The History of the Olympic Games: From Ancient Greece to Modern Times
PHYSICAL EDUCATION AND HEALTH LESSON.pptx

04 Reports

  • 1. If you’re using a laptop, start installing latex, from the instructions on the website Thursday, 2 September 2010
  • 2. Stat405 Statistical reports Hadley Wickham Thursday, 2 September 2010
  • 3. 1. More subsetting. 2. Missing values. 3. Statistical reports: data, code, graphics & written report Thursday, 2 September 2010
  • 4. Office hours Me: before class, DH 2056 Garrett: Wednesday, 3pm, DH 1041 Lab access: you should now have it Thursday, 2 September 2010
  • 5. Saving results # Prints to screen diamonds[diamonds$x > 10, ] # Saves to new data frame big <- diamonds[diamonds$x > 10, ] # Overwrites existing data frame. Dangerous! diamonds <- diamonds[diamonds$x < 10,] Thursday, 2 September 2010
  • 6. diamonds <- diamonds[1, 1] diamonds # Uh oh! rm(diamonds) str(diamonds) # Phew! Thursday, 2 September 2010
  • 7. Your turn Create a logical vector that selects diamonds with equal x & y. Create a new dataset that only contains these values. Create a logical vector that selects diamonds with incorrect/unusual x, y, or z values. Create a new dataset that omits these values. (Hint: do this one variable at a time) Thursday, 2 September 2010
  • 8. equal_dim <- diamonds$x == diamonds$y equal <- diamonds[equal_dim, ] y_big <- diamonds$y > 10 z_big <- diamonds$z > 6 x_zero <- diamonds$x == 0 y_zero <- diamonds$y == 0 z_zero <- diamonds$z == 0 zeros <- x_zero | y_zero | z_zero bad <- y_big | z_big | zeros good <- diamonds[!bad, ] Thursday, 2 September 2010
  • 9. Missing values Thursday, 2 September 2010
  • 10. Data errors Typically removing the entire row because of one error is overkill. Better to selectively replace problem values with missing values. In R, missing values are indicated by NA Thursday, 2 September 2010
  • 11. Expression Guess Actual 5 + NA NA / 2 sum(c(5, NA)) mean(c(5, NA) NA < 3 NA == 3 NA == NA Thursday, 2 September 2010
  • 12. NA behaviour Missing values propagate Use is.na() to check for missing values Many functions (e.g. sum and mean) have na.rm argument to remove missing values prior to computation. Thursday, 2 September 2010
  • 13. # Can use subsetting + <- to change individual # values diamonds$x[diamonds$x == 0] <- NA diamonds$y[diamonds$y == 0] <- NA diamonds$z[diamonds$z == 0] <- NA y_big <- !is.na(diamonds$y) & diamonds$y > 10 diamonds$y[y_big] <- diamonds$y[y_big] / 10 z_big <- !is.na(diamonds$z) & diamonds$z > 6 diamonds$z[z_big] <- diamonds$z[z_big] / 10 Thursday, 2 September 2010
  • 14. Your turn What happens if you don’t remove the missing values during the subsetting replacement? Why? Thursday, 2 September 2010
  • 15. Statistical reports Thursday, 2 September 2010
  • 16. Statistical reports Regardless of whether you go into academia or industry, you need to be able to present your findings. And you should be able to do more than just present them, you should be able to reproduce them. Thursday, 2 September 2010
  • 17. In Data (.csv) on e di re + ct ryo Code (.r) + Graphics (.png, .pdf) + Written report (.tex) Thursday, 2 September 2010
  • 18. Working directory Set your working directory to specify where files will be loaded from and saved to. From the terminal (linux or mac): the working directory is the directory you’re in when you start R On windows: File | Change dir. On the mac: ⌘-D Thursday, 2 September 2010
  • 19. Data So far we’ve just used built in datasets Next week we’ll learn how to use external data Thursday, 2 September 2010
  • 21. Workflow At the end of each interactive session, you want a summary of everything you did Two options: Save everything that you did with savehistory(filename.r) then remove the unimportant bits Build up the important bits as you go Up to you - I prefer the second Thursday, 2 September 2010
  • 22. R editor Linux: gedit (copy and paste - see website) Windows: File | New Script (press F5 to send line) Mac: File | New document (press command-enter to send) Thursday, 2 September 2010
  • 23. Code is communication! Thursday, 2 September 2010
  • 24. Code presentation Use comments (#) to describe what you are doing and to create scannable headings in your code Every comma should be followed by a space, and every mathematical operator (+, -, =, *, / etc) should be surrounded by spaces. Parentheses do not need spaces Lines should be at most 80 characters. If you have to break up a line, indent the following piece Thursday, 2 September 2010
  • 25. qplot(table,depth,data=diamonds) qplot(table,depth,data=diamonds)+xlim (50,70)+ylim(50,70) qplot(table-depth,data=diamonds,geom="histo gram") qplot(table/depth,data=diamonds,geom="histo gram",binwidth=0.01)+xlim(0.8,1.2) Thursday, 2 September 2010
  • 26. # Table and depth ------------------------- qplot(table, depth, data = diamonds) qplot(table, depth, data = diamonds) + xlim(50, 70) + ylim(50, 70) # Is there a linear relationship? qplot(table - depth, data = diamonds, geom = "histogram") # This bin width seems the most revealing qplot(table / depth, data = diamonds, geom = "histogram", binwidth = 0.01) + xlim(0.8, 1.2) # Also tried: 0.05, 0.005, 0.002 Thursday, 2 September 2010
  • 27. # Table and depth ------------------------- qplot(table, depth, data = diamonds) qplot(table, depth, data = diamonds) + xlim(50, 70) + ylim(50, 70) # Is there a linear relationship? qplot(table - depth, data = diamonds, geom = "histogram") # This bin width seems the most revealing qplot(table / depth, data = diamonds, geom = "histogram", binwidth = 0.01) + xlim(0.8, 1.2) # Also tried: 0.05, 0.005, 0.002 Thursday, 2 September 2010
  • 29. Saving graphics # Uses size on screen: ggsave("my-plot.pdf") ggsave("my-plot.png") # Specify size ggsave("my-plot.pdf", width = 6, height = 6) # Remember to set your working # directory! Thursday, 2 September 2010
  • 30. PDF PNG Vector based Raster based (can zoom in infinitely) (made up of pixels) Good for plots Good for most with thousands of plots points Thursday, 2 September 2010
  • 31. Your turn Recreate some of the graphics from previous lectures and save them. Experiment with the scale and height and width settings. Modify the template to include them. Thursday, 2 September 2010
  • 32. Written report Thursday, 2 September 2010
  • 33. Latex We are going to use the open source document typesetting system called latex to produce our reports. This is widespread in statistics - if you ever write a journal article, you will probably write it in latex. (Not as useful if you’re not in grad school, but still an important skill) Thursday, 2 September 2010
  • 34. Edit-Compile-Preview Edit: a text document with special formatting Compile: to produce a pdf Preview: with a pdf viewer See web page for system specifics. Thursday, 2 September 2010
  • 35. Latex Template Sections Images Figures and cross-references Verbatim input (for code) Thursday, 2 September 2010
  • 36. Your turn # Get the sample report wget http://had.co.nz/stat405/ resources/sample-report.zip unzip sample-report.zip cd sample-report gedit template.tex & pdflatex template.tex evince template.pdf # Experiment! Thursday, 2 September 2010
  • 37. Your turn If not on linux, follow the instructions on the class website. If you feel comfortable, start on homework 2. Thursday, 2 September 2010