SlideShare a Scribd company logo
Copyright (c) WLOG Solutions
Know your R usage
workflow to handle
reproducibility challenges
Budapest, 2018
Copyright (c) WLOG Solutions
Kate and Henry
Freelancer/scientist/
consultant
The Team
Corporate/
In-house team
Meet Personas
John
Student/hobbyist
Copyright (c) WLOG Solutions
They were coding in R
happily until that one
day...
Copyright (c) WLOG Solutions
https://xkcd.com/234/
Copyright (c) WLOG Solutions
John
Could not deliver R labs homework due to
package incompatibility at professors
laptop.
Copyright (c) WLOG Solutions
Kate and Henry
Missed deadlines due to problems
installing packages for their R shiny app at
Customer’s Server running
RedHat Enterprise 6.8.
Copyright (c) WLOG Solutions
The Team
Had serious issues with package versions
conflicts due to many users, many
projects,
running RedHat Enteprise machine
without internet access.
Copyright (c) WLOG Solutions
Three different stories
the same
reproducibility
problem.
Copyright (c) WLOG Solutions
What is reproducibility?
Copyright (c) WLOG Solutions
Reproducibility is the
ability to run your code repeatedly,
at different time,
using different computer,
in such way to
obtain the same outputs given the
same inputs.
Copyright (c) WLOG Solutions
Reproducibility is the
ability to run a code repeatedly,
at different time,
using different computer,
in such way to
obtain the same outputs given the
same inputs.
Copyright (c) WLOG Solutions
Reproducibility is the
ability to run your code repeatedly,
at different time,
using different computer,
in such way to
obtain the same outputs given the
same inputs.
Copyright (c) WLOG Solutions
Reproducibility is the
ability to run your code repeatedly,
at different time,
at different computer,
in such way to
obtain the same outputs given the
same inputs.
Copyright (c) WLOG Solutions
Reproducibility is the
ability to run your code repeatedly,
at different time,
using different computer,
in such way to
obtain the same outputs given the
same inputs.
Copyright (c) WLOG Solutions
Bare metal
Operating system
Solution dependencies
Code
Data
Copyright (c) WLOG Solutions
Few examples
Copyright (c) WLOG Solutions 17
forecast v7.2
- ggplot2 (>= 2.0.0)
- Rcpp (>= 0.11)
- Added gglagplot
R 3.3.1
2016-01-03 2016-09-08
forecast v6.2
- Rcpp (>= 0.11)
R 3.2.3
forecast v8.0
- ggplot2 (>= 2.0.0)
- Rcpp (>= 0.11)
- Modified defaults
for gglagplot
R 3.3.2
2017-03-01
Copyright (c) WLOG Solutions 18
Copyright (c) WLOG Solutions
Development Production
Copyright (c) WLOG Solutions
I recommend using
rocker/r-ver
Copyright (c) WLOG Solutions
When is reproducibility
important while you
program in R?
Copyright (c) WLOG Solutions
Debian/Ubuntu
RedHat/Centos
Windows
Debian/Ubuntu
RedHat/Centos
Windows
Development Production
Deploy (share) solution to production
Copyright (c) WLOG Solutions
Debian/Ubuntu
RedHat/Centos
Windows
Debian/Ubuntu
RedHat/Centos
Windows
Development Development’
Restore development environment
Copyright (c) WLOG Solutions
Three workflows
three reproducibility
solutions.
Copyright (c) WLOG Solutions
John, student/hobbyist
Dev/Production
Version
controlFamily&Friends or
Professor
MRAN
Copyright (c) WLOG Solutions
Kate and Henry, consultancy
team/freelancer/scientist
DevProduction
Continuous
integration
Version
control
Local CRAN
MRAN
On-premise
Cloud
Spark
etc.
Copyright (c) WLOG Solutions
The Team, corporate/in-house team
DevProduction
Continuous
integration
Version
control
Local CRAN
Copyright (c) WLOG Solutions
One word on Docker
Development Production
Build for
different OS
Deployment
package
. zip
Copyright (c) WLOG Solutions
Second word on Docker
Development Production
Build
Docker
image
Copyright (c) WLOG Solutions
CRAN
management
Multiple R
versions
Debian/Ubuntu
Windows
RedHat/CenOS
Docker
Jenkins
Isolated
projects
http://rsuite.io
https://github.com/WLOGSolutions/RSuite
https://www.slideshare.net/WLOGSolutions
No installation
on prod
Internetless
environments
System
requirements
Git/SVN
Binary
packages
31
Wit Jakuczun
CEO
wit.Jakuczun@wlogsolutions.com
+48 601820620
http://www.wlogsolutions.com

More Related Content

What's hot (19)

PPTX
NLP2API: Replication package accepted by ICSME 2018
Masud Rahman
 
PDF
Archiving Oracle Primavera project plans with software development tools
Gunther Pippèrr
 
PPTX
OpenACC Monthly Highlights: June 2020
OpenACC
 
PPTX
Integration of static and dynamic analysis for understanding legacy source code
Michael Moser
 
PPTX
OpenACC Highlights: 2019 Year in Review
OpenACC
 
PDF
ACS San Diego - The RDKit: Open-source cheminformatics
Greg Landrum
 
PPTX
OpenACC Monthly Highlights: June 2021
OpenACC
 
PPTX
OpenACC Monthly Highlights: March 2021
OpenACC
 
PDF
Scossu gdi iiif_r+d_report_2019
Stefano Cossu
 
PPTX
OpenACC Highlights: GTC Digital April 2020
OpenACC
 
PPTX
OpenACC Monthly Highlights February 2019
NVIDIA
 
PPTX
OpenACC Monthly Highlights: May 2019
OpenACC
 
PPT
NASA_EPSCoR_poster_2015
Longyin Cui
 
PPTX
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Revolution Analytics
 
PPTX
Raster Algebra mit Oracle Spatial und uDig
Karin Patenge
 
PPTX
Info gdal 20150915
GeoMedeelel
 
PPTX
Jan2015 bioinfo update_on_ftp_sr_aand_usage
GenomeInABottle
 
PDF
Beacon v2 Reference Implementation: An Overview
CINECAProject
 
PDF
167 - Productivity for proof engineering
ESEM 2014
 
NLP2API: Replication package accepted by ICSME 2018
Masud Rahman
 
Archiving Oracle Primavera project plans with software development tools
Gunther Pippèrr
 
OpenACC Monthly Highlights: June 2020
OpenACC
 
Integration of static and dynamic analysis for understanding legacy source code
Michael Moser
 
OpenACC Highlights: 2019 Year in Review
OpenACC
 
ACS San Diego - The RDKit: Open-source cheminformatics
Greg Landrum
 
OpenACC Monthly Highlights: June 2021
OpenACC
 
OpenACC Monthly Highlights: March 2021
OpenACC
 
Scossu gdi iiif_r+d_report_2019
Stefano Cossu
 
OpenACC Highlights: GTC Digital April 2020
OpenACC
 
OpenACC Monthly Highlights February 2019
NVIDIA
 
OpenACC Monthly Highlights: May 2019
OpenACC
 
NASA_EPSCoR_poster_2015
Longyin Cui
 
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Revolution Analytics
 
Raster Algebra mit Oracle Spatial und uDig
Karin Patenge
 
Info gdal 20150915
GeoMedeelel
 
Jan2015 bioinfo update_on_ftp_sr_aand_usage
GenomeInABottle
 
Beacon v2 Reference Implementation: An Overview
CINECAProject
 
167 - Productivity for proof engineering
ESEM 2014
 

Similar to Know your R usage workflow to handle reproducibility challenges (20)

PPTX
Reproducible Data Science with R
Revolution Analytics
 
PPTX
Reproducible research concepts and tools
C. Tobin Magle
 
PDF
Reproducible Research in R and R Studio
Susan Johnston
 
PPTX
Reproducible research
C. Tobin Magle
 
PDF
20150422 repro resr
Susan Johnston
 
PPTX
A Step Towards Reproducibility in R
Revolution Analytics
 
PPTX
Intro to Reproducible Research
C. Tobin Magle
 
PDF
Reproducibility with R
Martin Jung
 
PPTX
Reproducibility with Checkpoint & RRO - NYC R Conference
Revolution Analytics
 
PPTX
Reproducibility with Checkpoint & RRO
Work-Bench
 
PDF
Managing large scale projects in R with R Suite
WLOG Solutions
 
PPTX
R reproducibility
Revolution Analytics
 
PPTX
R sharing 101
Omnia Safaan
 
PDF
Reproducibility with Revolution R Open and the Checkpoint Package
Revolution Analytics
 
PPTX
Scientific Software Development
jalle6
 
PPTX
Reproducible Computational Research in R
Samuel Bosch
 
PDF
Data Analysis with R (combined slides)
Guy Lebanon
 
PDF
Always Be Deploying. How to make R great for machine learning in (not only) E...
Wit Jakuczun
 
PDF
Language-agnostic data analysis workflows and reproducible research
Andrew Lowe
 
PDF
Extending lifespan with Hadoop and R
Radek Maciaszek
 
Reproducible Data Science with R
Revolution Analytics
 
Reproducible research concepts and tools
C. Tobin Magle
 
Reproducible Research in R and R Studio
Susan Johnston
 
Reproducible research
C. Tobin Magle
 
20150422 repro resr
Susan Johnston
 
A Step Towards Reproducibility in R
Revolution Analytics
 
Intro to Reproducible Research
C. Tobin Magle
 
Reproducibility with R
Martin Jung
 
Reproducibility with Checkpoint & RRO - NYC R Conference
Revolution Analytics
 
Reproducibility with Checkpoint & RRO
Work-Bench
 
Managing large scale projects in R with R Suite
WLOG Solutions
 
R reproducibility
Revolution Analytics
 
R sharing 101
Omnia Safaan
 
Reproducibility with Revolution R Open and the Checkpoint Package
Revolution Analytics
 
Scientific Software Development
jalle6
 
Reproducible Computational Research in R
Samuel Bosch
 
Data Analysis with R (combined slides)
Guy Lebanon
 
Always Be Deploying. How to make R great for machine learning in (not only) E...
Wit Jakuczun
 
Language-agnostic data analysis workflows and reproducible research
Andrew Lowe
 
Extending lifespan with Hadoop and R
Radek Maciaszek
 
Ad

More from Wit Jakuczun (11)

PDF
recommendation = optimization(prediction)
Wit Jakuczun
 
PDF
Driving your marketing automation with multi-armed bandits in real time
Wit Jakuczun
 
PDF
Large scale machine learning projects with r suite
Wit Jakuczun
 
PDF
20170928 why r_r jako główna platforma do zaawansowanej analityki w enterprise
Wit Jakuczun
 
PDF
Wit jakuczun dss_conf_2017_jak_wdrazac_r_w_enterprise
Wit Jakuczun
 
PPTX
Bringing the Power of LocalSolver to R: a Real-Life Case-Study
Wit Jakuczun
 
PDF
ANALYTICS WITHOUT LOSS OF GENERALITY
Wit Jakuczun
 
PDF
Showcase: on segmentation importance for marketing campaign in retail using R...
Wit Jakuczun
 
PDF
20150521 ser protecto_r_final
Wit Jakuczun
 
PDF
Rozwiązywanie problemów optymalizacyjnych (z przykładem w R)
Wit Jakuczun
 
PDF
R+H2O - idealny tandem do analityki predykcyjnej?
Wit Jakuczun
 
recommendation = optimization(prediction)
Wit Jakuczun
 
Driving your marketing automation with multi-armed bandits in real time
Wit Jakuczun
 
Large scale machine learning projects with r suite
Wit Jakuczun
 
20170928 why r_r jako główna platforma do zaawansowanej analityki w enterprise
Wit Jakuczun
 
Wit jakuczun dss_conf_2017_jak_wdrazac_r_w_enterprise
Wit Jakuczun
 
Bringing the Power of LocalSolver to R: a Real-Life Case-Study
Wit Jakuczun
 
ANALYTICS WITHOUT LOSS OF GENERALITY
Wit Jakuczun
 
Showcase: on segmentation importance for marketing campaign in retail using R...
Wit Jakuczun
 
20150521 ser protecto_r_final
Wit Jakuczun
 
Rozwiązywanie problemów optymalizacyjnych (z przykładem w R)
Wit Jakuczun
 
R+H2O - idealny tandem do analityki predykcyjnej?
Wit Jakuczun
 
Ad

Recently uploaded (20)

PPTX
Rocket-Launched-PowerPoint-Template.pptx
Arden31
 
PPTX
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
PDF
List of all the AI prompt cheat codes.pdf
Avijit Kumar Roy
 
PDF
Performance Report Sample (Draft7).pdf
AmgadMaher5
 
PPTX
Hadoop_EcoSystem slide by CIDAC India.pptx
migbaruget
 
PPTX
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
PPTX
Human-Action-Recognition-Understanding-Behavior.pptx
nreddyjanga
 
PDF
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
PPT
Lecture 2-1.ppt at a higher learning institution such as the university of Za...
rachealhantukumane52
 
PDF
Incident Response and Digital Forensics Certificate
VICTOR MAESTRE RAMIREZ
 
PPTX
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
PDF
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
PDF
Choosing the Right Database for Indexing.pdf
Tamanna
 
PPT
Data base management system Transactions.ppt
gandhamcharan2006
 
PDF
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
DOCX
AI/ML Applications in Financial domain projects
Rituparna De
 
PDF
WEF_Future_of_Global_Fintech_Second_Edition_2025.pdf
AproximacionAlFuturo
 
PDF
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
PPTX
Slide studies GC- CRC - PC - HNC baru.pptx
LLen8
 
DOC
MATRIX_AMAN IRAWAN_20227479046.docbbbnnb
vanitafiani1
 
Rocket-Launched-PowerPoint-Template.pptx
Arden31
 
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
List of all the AI prompt cheat codes.pdf
Avijit Kumar Roy
 
Performance Report Sample (Draft7).pdf
AmgadMaher5
 
Hadoop_EcoSystem slide by CIDAC India.pptx
migbaruget
 
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
Human-Action-Recognition-Understanding-Behavior.pptx
nreddyjanga
 
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
Lecture 2-1.ppt at a higher learning institution such as the university of Za...
rachealhantukumane52
 
Incident Response and Digital Forensics Certificate
VICTOR MAESTRE RAMIREZ
 
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
Choosing the Right Database for Indexing.pdf
Tamanna
 
Data base management system Transactions.ppt
gandhamcharan2006
 
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
AI/ML Applications in Financial domain projects
Rituparna De
 
WEF_Future_of_Global_Fintech_Second_Edition_2025.pdf
AproximacionAlFuturo
 
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
Slide studies GC- CRC - PC - HNC baru.pptx
LLen8
 
MATRIX_AMAN IRAWAN_20227479046.docbbbnnb
vanitafiani1
 

Know your R usage workflow to handle reproducibility challenges

  • 1. Copyright (c) WLOG Solutions Know your R usage workflow to handle reproducibility challenges Budapest, 2018
  • 2. Copyright (c) WLOG Solutions Kate and Henry Freelancer/scientist/ consultant The Team Corporate/ In-house team Meet Personas John Student/hobbyist
  • 3. Copyright (c) WLOG Solutions They were coding in R happily until that one day...
  • 4. Copyright (c) WLOG Solutions https://xkcd.com/234/
  • 5. Copyright (c) WLOG Solutions John Could not deliver R labs homework due to package incompatibility at professors laptop.
  • 6. Copyright (c) WLOG Solutions Kate and Henry Missed deadlines due to problems installing packages for their R shiny app at Customer’s Server running RedHat Enterprise 6.8.
  • 7. Copyright (c) WLOG Solutions The Team Had serious issues with package versions conflicts due to many users, many projects, running RedHat Enteprise machine without internet access.
  • 8. Copyright (c) WLOG Solutions Three different stories the same reproducibility problem.
  • 9. Copyright (c) WLOG Solutions What is reproducibility?
  • 10. Copyright (c) WLOG Solutions Reproducibility is the ability to run your code repeatedly, at different time, using different computer, in such way to obtain the same outputs given the same inputs.
  • 11. Copyright (c) WLOG Solutions Reproducibility is the ability to run a code repeatedly, at different time, using different computer, in such way to obtain the same outputs given the same inputs.
  • 12. Copyright (c) WLOG Solutions Reproducibility is the ability to run your code repeatedly, at different time, using different computer, in such way to obtain the same outputs given the same inputs.
  • 13. Copyright (c) WLOG Solutions Reproducibility is the ability to run your code repeatedly, at different time, at different computer, in such way to obtain the same outputs given the same inputs.
  • 14. Copyright (c) WLOG Solutions Reproducibility is the ability to run your code repeatedly, at different time, using different computer, in such way to obtain the same outputs given the same inputs.
  • 15. Copyright (c) WLOG Solutions Bare metal Operating system Solution dependencies Code Data
  • 16. Copyright (c) WLOG Solutions Few examples
  • 17. Copyright (c) WLOG Solutions 17 forecast v7.2 - ggplot2 (>= 2.0.0) - Rcpp (>= 0.11) - Added gglagplot R 3.3.1 2016-01-03 2016-09-08 forecast v6.2 - Rcpp (>= 0.11) R 3.2.3 forecast v8.0 - ggplot2 (>= 2.0.0) - Rcpp (>= 0.11) - Modified defaults for gglagplot R 3.3.2 2017-03-01
  • 18. Copyright (c) WLOG Solutions 18
  • 19. Copyright (c) WLOG Solutions Development Production
  • 20. Copyright (c) WLOG Solutions I recommend using rocker/r-ver
  • 21. Copyright (c) WLOG Solutions When is reproducibility important while you program in R?
  • 22. Copyright (c) WLOG Solutions Debian/Ubuntu RedHat/Centos Windows Debian/Ubuntu RedHat/Centos Windows Development Production Deploy (share) solution to production
  • 23. Copyright (c) WLOG Solutions Debian/Ubuntu RedHat/Centos Windows Debian/Ubuntu RedHat/Centos Windows Development Development’ Restore development environment
  • 24. Copyright (c) WLOG Solutions Three workflows three reproducibility solutions.
  • 25. Copyright (c) WLOG Solutions John, student/hobbyist Dev/Production Version controlFamily&Friends or Professor MRAN
  • 26. Copyright (c) WLOG Solutions Kate and Henry, consultancy team/freelancer/scientist DevProduction Continuous integration Version control Local CRAN MRAN On-premise Cloud Spark etc.
  • 27. Copyright (c) WLOG Solutions The Team, corporate/in-house team DevProduction Continuous integration Version control Local CRAN
  • 28. Copyright (c) WLOG Solutions One word on Docker Development Production Build for different OS Deployment package . zip
  • 29. Copyright (c) WLOG Solutions Second word on Docker Development Production Build Docker image
  • 30. Copyright (c) WLOG Solutions CRAN management Multiple R versions Debian/Ubuntu Windows RedHat/CenOS Docker Jenkins Isolated projects http://rsuite.io https://github.com/WLOGSolutions/RSuite https://www.slideshare.net/WLOGSolutions No installation on prod Internetless environments System requirements Git/SVN Binary packages