SlideShare a Scribd company logo
DataOps
Data Science Empowerment through
DevOps, Cloud Computing and Building
your own Applications
Kelly O’Briant
Data Science Product Engineer
kelly@rladies.org
@kellrstats | @RLadiesDC
• R-Ladies Washington DC Chapter Founder
and Organizer
• R-Ladies Global unofficial “cloud expert”
• Publish a monthly series called .rprofile
on the rOpenSci blog
• Business Science University
course developer
My Talk Goal:
I want you to leave this conference so
excited, you go back to work and completely
ignore whatever project you’re supposed to
be working on because you’re so pumped up
about building a data product and you can’t
stop yourself from doing it.
Motivation
Why I talk about Data Science Empowerment
R-Ladies events
• How do I get a job as a data scientist/analyst/anything?
• What should I study/learn/do/produce to be a data scientist?
• Am I even a data scientist? Is what I do data science?
Why are data products empowering?
• I use data products to justify/prove to myself that I belong, that my
ideas are valid and to help me communicate with people who are bad
at listening (or when I’m bad at speaking)
Motivation
Traumatic Experiences!
Windows Lab Linux Lab Mac Lab
R-Ladies + International Women’s Day
Twitter Campaign
• Create a twitter bot using R code
to tweet out a profile for every
woman in our Global speaker
directory
• Project collaboration through GitHub
• Docker linked to a local volume
• Twitter Application(s)
Deploy and Use H2O Machine Learning
Models in Production
• Build and validate a model in python
working in a Jupyter Notebook with the
H2O machine learning API
• Package the model code as a POJO or
MOJO file
• Deploy the model to H2O.ai STEAM to
create an ML prediction service complete
with a REST API query URL
Create and Maintain a Personal Website
• Use the blogdown package in an
RStudio project to create the
framework for a Hugo static
website
• Create content for the site by
writing Rmarkdown files
• Compile and deploy the static site –
choose a hosting mechanism:
GitHub? Continuous Integration
with Netlify?
Why are you so into R?
• It’s great for Data Science
• The community at large is awesome
• The female community is awesome
• R integrates with other tech
• It’s growing really fast in cool ways
• I can use it to build cool stuff
Why are you so into R?
• It’s great for Data Science
• The community at large is awesome
• The female community is awesome
• R integrates with other tech
• It’s growing really fast in cool ways
• I can use it to build cool stuff
#rstats
Why are you so into R?
• It’s great for Data Science
• The community at large is awesome
• The female community is awesome
• R integrates with other tech
• It’s growing really fast in cool ways
• I can use it to build cool stuff
Worldwide organization
that promotes gender diversity
in the R community via meetups
and mentorship in a friendly and
safe environment
Why are you so into R?
• It’s great for Data Science
• The community at large is awesome
• The female community is awesome
• R integrates with other tech
• It’s growing really fast in cool ways
• I can use it to build cool stuff
Why are you so into R?
• It’s great for Data Science
• The community at large is awesome
• The female community is awesome
• R integrates with other tech
• It’s growing really fast in cool ways
• I can use it to build cool stuff
Why are you so into R?
• It’s great for Data Science
• The community at large is awesome
• The female community is awesome
• R integrates with other tech
• It’s growing really fast in cool ways
• I can use it to build cool stuff
Back to the topic: DataOps
1. It usually takes a little DevOps to build a Data Product
2. Building more Data Products is empowering – good for your portfolio and soul
What is DevOps
And why should Data-oriented people care about it?
DevOps is…
“A combination of cultural philosophies, practices
and tools that increases an organizations ability to
deliver applications and services at high velocity.
- AWS DevOps Blog
Deliver applications and services at high velocity
Do This – without pulling all
your hair out?
Deliver applications and services at high velocity
Do This – Super Effectively
Host your analysis
• Share
• Publish
• Collaborate
• Prove a point
• Serve a purpose
• Be reproducible
• Save the day
What is DataOps?
DataOps?
Anywhere you can put a little DevOps magic into your data science workflow
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a Hint of DevOps
Build More Data Products
So that you and others can use them to solve real problems
Try Shiny!
The Iris Dataset
Do Machine Learning!
So Hot Right Now
What Species
is this iris??
Credit: xkcd
1. Turn your ideas into R code
• Write functions to generate the
plots you’re envisioning
• Package: ggplot2
• Train and validate a machine
learning model to use
• Package: caret
geom_hist_basic <- function(var){
ggplot(iris, aes_string(x = var)) +
geom_histogram() +
facet_wrap(~ Species)
}
predict_matrix(fit.knn, validation)
Confusion Matrix and Statistics
Prediction setosa versicolor virginica
setosa 10 0 0
versicolor 0 8 1
virginica 0 2 9
2. Turn your R code into an R Shiny app
Client Side Code:
User Interface and
Input Elements
Server Side Code:
(Reactive) R Output
Elements
shinyApp(ui = fluidPage, server = serverFunction)
fluidPage
Code
serverFunction
Code
Try Plumber!
Let’s Build a REST API with R
1. Write Functions in R
Expose Data or Model
Produce Analysis or Visualization
Data Agnostic
Perform Analysis on New Data
2. Create Plumber
API Endpoints
- Get
- Post
4. Send Requests to
the Plumber Service
Through external (or
internal) Applications
- Jupyter Notebooks
- Web Apps
3. Host the Plumber
Script on a Server
- Create Plumber
router object
- Run in an R Session
Docker Image
RStudio
Server
R Session
Running
Plumber
REST API
My Local File
System
- Plumber.R
- Dockerfile
Local Volume Link
Applications
&
Notebooks
Requests!
Demo Framework
That’s it!
Now go build some sweet data products
Resources for Learning R
R-Ladies Global Meetups
• Get involved!
• More female speakers,
leaders, teachers, builders,
friends!
RLadies.org
@RLadiesGlobal
RStudio Webinars
• All of the talks
from RStudio::conf
2018 have just
been published
• Highly
recommend!
Resources for Learning Shiny Development
shiny.rstudio.com
Resources for Learning Plumber
www.rplumber.io
@TrestleJeff
on Twitter!
Note to self: Remember to give
out stickers
I have R-Ladies and R-Ladies Plumber Stickers!
I’m Kelly!
@kellrstats on Twitter

More Related Content

What's hot (20)

PDF
Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...
DataKitchen
 
PDF
Do Agile Data in Just 5 Shocking Steps!
DataKitchen
 
PDF
devopsdays Warsaw 2018 - Chaos while deploying ML
Thiago de Faria
 
PDF
The Proliferation of New Database Technologies and Implications for Data Scie...
Domino Data Lab
 
PDF
Modernizing to a Cloud Data Architecture
Databricks
 
PDF
The Model Enterprise: A Blueprint for Enterprise Data Governance
Eric Kavanagh
 
PDF
Bridged Overview by CodeData
Sam Sur
 
PDF
What’s New with Databricks Machine Learning
Databricks
 
PDF
Understanding DataOps and Its Impact on Application Quality
DevOps.com
 
PPTX
Moving to the Cloud: Modernizing Data Architecture in Healthcare
Perficient, Inc.
 
PPTX
Beyond Batch: Is ETL still relevant in the API economy?
SnapLogic
 
PPTX
Introduction to Data Engineering
Vivek Aanand Ganesan
 
PDF
Big Data for Managers: From hadoop to streaming and beyond
DataWorks Summit/Hadoop Summit
 
PDF
Data engineering design patterns
Valdas Maksimavičius
 
PPTX
Hadoop dev 01
Vivian S. Zhang
 
PDF
The lean principles of data ops
Lars Albertsson
 
PPTX
The DBA Is Dead (Again). Long Live the DBA !
Christian Bilien
 
PPTX
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
DataKitchen
 
PPTX
Surviving the Hadoop Revolution
DataWorks Summit/Hadoop Summit
 
PDF
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
Dataiku
 
Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...
DataKitchen
 
Do Agile Data in Just 5 Shocking Steps!
DataKitchen
 
devopsdays Warsaw 2018 - Chaos while deploying ML
Thiago de Faria
 
The Proliferation of New Database Technologies and Implications for Data Scie...
Domino Data Lab
 
Modernizing to a Cloud Data Architecture
Databricks
 
The Model Enterprise: A Blueprint for Enterprise Data Governance
Eric Kavanagh
 
Bridged Overview by CodeData
Sam Sur
 
What’s New with Databricks Machine Learning
Databricks
 
Understanding DataOps and Its Impact on Application Quality
DevOps.com
 
Moving to the Cloud: Modernizing Data Architecture in Healthcare
Perficient, Inc.
 
Beyond Batch: Is ETL still relevant in the API economy?
SnapLogic
 
Introduction to Data Engineering
Vivek Aanand Ganesan
 
Big Data for Managers: From hadoop to streaming and beyond
DataWorks Summit/Hadoop Summit
 
Data engineering design patterns
Valdas Maksimavičius
 
Hadoop dev 01
Vivian S. Zhang
 
The lean principles of data ops
Lars Albertsson
 
The DBA Is Dead (Again). Long Live the DBA !
Christian Bilien
 
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
DataKitchen
 
Surviving the Hadoop Revolution
DataWorks Summit/Hadoop Summit
 
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
Dataiku
 

Similar to Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a Hint of DevOps (20)

PDF
Lean Analytics: How to get more out of your data science team
Digital Transformation EXPO Event Series
 
PPTX
Maintainable Machine Learning Products
Andrew Musselman
 
PDF
Business in the Driver’s Seat – An Improved Model for Integration
Inside Analysis
 
PDF
Data science presentation
MSDEVMTL
 
PPTX
Wsrest13 gilherme keynote
ruyalarcon
 
PDF
The New Frontier: Optimizing Big Data Exploration
Inside Analysis
 
PDF
SciPy Latin America 2019
Travis Oliphant
 
PDF
Big Data for Data Scientists - Info Session
WeCloudData
 
PDF
Drupal - Changing the Web by Connecting Open Minds - Josef Dabernig
DrupalCampDN
 
PDF
Let's analyze how world reacts to road traffic by sentiment analysis final
Sajeetharan
 
PPTX
Enabling Data centric Teams
Data Con LA
 
PDF
The Right Data Warehouse: Automation Now, Business Value Thereafter
Inside Analysis
 
PDF
The Future of Data Science
DataWorks Summit
 
PDF
Ncku csie talk about Spark
Giivee The
 
PPTX
Data science tools of the trade
Fangda Wang
 
PDF
Building successful data science teams
Venkatesh Umaashankar
 
PPTX
From SQL to Python - A Beginner's Guide to Making the Switch
Rachel Berryman
 
PPTX
Reproducible Research with R, The Tidyverse, Notebooks, and Spark
Adaryl "Bob" Wakefield, MBA
 
KEY
Become Efficient or Die: The Story of BackType
nathanmarz
 
PPTX
SPSNYC2019 - What is Common Data Model and how to use it?
Nicolas Georgeault
 
Lean Analytics: How to get more out of your data science team
Digital Transformation EXPO Event Series
 
Maintainable Machine Learning Products
Andrew Musselman
 
Business in the Driver’s Seat – An Improved Model for Integration
Inside Analysis
 
Data science presentation
MSDEVMTL
 
Wsrest13 gilherme keynote
ruyalarcon
 
The New Frontier: Optimizing Big Data Exploration
Inside Analysis
 
SciPy Latin America 2019
Travis Oliphant
 
Big Data for Data Scientists - Info Session
WeCloudData
 
Drupal - Changing the Web by Connecting Open Minds - Josef Dabernig
DrupalCampDN
 
Let's analyze how world reacts to road traffic by sentiment analysis final
Sajeetharan
 
Enabling Data centric Teams
Data Con LA
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
Inside Analysis
 
The Future of Data Science
DataWorks Summit
 
Ncku csie talk about Spark
Giivee The
 
Data science tools of the trade
Fangda Wang
 
Building successful data science teams
Venkatesh Umaashankar
 
From SQL to Python - A Beginner's Guide to Making the Switch
Rachel Berryman
 
Reproducible Research with R, The Tidyverse, Notebooks, and Spark
Adaryl "Bob" Wakefield, MBA
 
Become Efficient or Die: The Story of BackType
nathanmarz
 
SPSNYC2019 - What is Common Data Model and how to use it?
Nicolas Georgeault
 
Ad

More from Rehgan Avon (9)

PDF
Ezgi Karaesmen - Data Cleaning and Manipulation with R
Rehgan Avon
 
PDF
Dr. Karen Amstutz - Digitizing Health: How Analytics are Disrupting Healthca...
Rehgan Avon
 
PDF
Amanda Cinnamon - Treat Your Code Like the Valuable Software It Is
Rehgan Avon
 
PDF
Cheryl Wiebe - Advanced Analytics in the Industrial World
Rehgan Avon
 
PDF
Wei Xu - Innovative Applications of AI Panel
Rehgan Avon
 
PPTX
Helen Patton - Governing Big Data: Security, Privacy & Data Management
Rehgan Avon
 
PPT
Dr. Lara Sucheston-Campbell - Building a working farm: Planning and planting ...
Rehgan Avon
 
PDF
Bijaya Zenchenko - An Embedding is Worth 1000 Words - Start Using Word Embedd...
Rehgan Avon
 
PDF
BDAA_Newsletter
Rehgan Avon
 
Ezgi Karaesmen - Data Cleaning and Manipulation with R
Rehgan Avon
 
Dr. Karen Amstutz - Digitizing Health: How Analytics are Disrupting Healthca...
Rehgan Avon
 
Amanda Cinnamon - Treat Your Code Like the Valuable Software It Is
Rehgan Avon
 
Cheryl Wiebe - Advanced Analytics in the Industrial World
Rehgan Avon
 
Wei Xu - Innovative Applications of AI Panel
Rehgan Avon
 
Helen Patton - Governing Big Data: Security, Privacy & Data Management
Rehgan Avon
 
Dr. Lara Sucheston-Campbell - Building a working farm: Planning and planting ...
Rehgan Avon
 
Bijaya Zenchenko - An Embedding is Worth 1000 Words - Start Using Word Embedd...
Rehgan Avon
 
BDAA_Newsletter
Rehgan Avon
 
Ad

Recently uploaded (20)

PPTX
thid ppt defines the ich guridlens and gives the information about the ICH gu...
shaistabegum14
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PPTX
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
PPTX
Feb 2021 Ransomware Recovery presentation.pptx
enginsayin1
 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PPTX
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
PPTX
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
PDF
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
PPTX
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PPTX
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
PPTX
big data eco system fundamentals of data science
arivukarasi
 
PDF
Business implication of Artificial Intelligence.pdf
VishalChugh12
 
PDF
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
PPTX
05_Jelle Baats_Tekst.pptx_AI_Barometer_Release_Event
FinTech Belgium
 
PPTX
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
PDF
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
PPTX
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
PPTX
BinarySearchTree in datastructures in detail
kichokuttu
 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
thid ppt defines the ich guridlens and gives the information about the ICH gu...
shaistabegum14
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
Feb 2021 Ransomware Recovery presentation.pptx
enginsayin1
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
big data eco system fundamentals of data science
arivukarasi
 
Business implication of Artificial Intelligence.pdf
VishalChugh12
 
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
05_Jelle Baats_Tekst.pptx_AI_Barometer_Release_Event
FinTech Belgium
 
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
BinarySearchTree in datastructures in detail
kichokuttu
 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 

Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a Hint of DevOps

  • 1. DataOps Data Science Empowerment through DevOps, Cloud Computing and Building your own Applications
  • 2. Kelly O’Briant Data Science Product Engineer [email protected] @kellrstats | @RLadiesDC • R-Ladies Washington DC Chapter Founder and Organizer • R-Ladies Global unofficial “cloud expert” • Publish a monthly series called .rprofile on the rOpenSci blog • Business Science University course developer
  • 3. My Talk Goal: I want you to leave this conference so excited, you go back to work and completely ignore whatever project you’re supposed to be working on because you’re so pumped up about building a data product and you can’t stop yourself from doing it.
  • 4. Motivation Why I talk about Data Science Empowerment R-Ladies events • How do I get a job as a data scientist/analyst/anything? • What should I study/learn/do/produce to be a data scientist? • Am I even a data scientist? Is what I do data science? Why are data products empowering? • I use data products to justify/prove to myself that I belong, that my ideas are valid and to help me communicate with people who are bad at listening (or when I’m bad at speaking)
  • 6. R-Ladies + International Women’s Day Twitter Campaign • Create a twitter bot using R code to tweet out a profile for every woman in our Global speaker directory • Project collaboration through GitHub • Docker linked to a local volume • Twitter Application(s)
  • 7. Deploy and Use H2O Machine Learning Models in Production • Build and validate a model in python working in a Jupyter Notebook with the H2O machine learning API • Package the model code as a POJO or MOJO file • Deploy the model to H2O.ai STEAM to create an ML prediction service complete with a REST API query URL
  • 8. Create and Maintain a Personal Website • Use the blogdown package in an RStudio project to create the framework for a Hugo static website • Create content for the site by writing Rmarkdown files • Compile and deploy the static site – choose a hosting mechanism: GitHub? Continuous Integration with Netlify?
  • 9. Why are you so into R? • It’s great for Data Science • The community at large is awesome • The female community is awesome • R integrates with other tech • It’s growing really fast in cool ways • I can use it to build cool stuff
  • 10. Why are you so into R? • It’s great for Data Science • The community at large is awesome • The female community is awesome • R integrates with other tech • It’s growing really fast in cool ways • I can use it to build cool stuff #rstats
  • 11. Why are you so into R? • It’s great for Data Science • The community at large is awesome • The female community is awesome • R integrates with other tech • It’s growing really fast in cool ways • I can use it to build cool stuff Worldwide organization that promotes gender diversity in the R community via meetups and mentorship in a friendly and safe environment
  • 12. Why are you so into R? • It’s great for Data Science • The community at large is awesome • The female community is awesome • R integrates with other tech • It’s growing really fast in cool ways • I can use it to build cool stuff
  • 13. Why are you so into R? • It’s great for Data Science • The community at large is awesome • The female community is awesome • R integrates with other tech • It’s growing really fast in cool ways • I can use it to build cool stuff
  • 14. Why are you so into R? • It’s great for Data Science • The community at large is awesome • The female community is awesome • R integrates with other tech • It’s growing really fast in cool ways • I can use it to build cool stuff
  • 15. Back to the topic: DataOps 1. It usually takes a little DevOps to build a Data Product 2. Building more Data Products is empowering – good for your portfolio and soul
  • 16. What is DevOps And why should Data-oriented people care about it? DevOps is… “A combination of cultural philosophies, practices and tools that increases an organizations ability to deliver applications and services at high velocity. - AWS DevOps Blog
  • 17. Deliver applications and services at high velocity Do This – without pulling all your hair out?
  • 18. Deliver applications and services at high velocity Do This – Super Effectively Host your analysis • Share • Publish • Collaborate • Prove a point • Serve a purpose • Be reproducible • Save the day
  • 19. What is DataOps? DataOps? Anywhere you can put a little DevOps magic into your data science workflow
  • 21. Build More Data Products So that you and others can use them to solve real problems
  • 24. Do Machine Learning! So Hot Right Now What Species is this iris?? Credit: xkcd
  • 25. 1. Turn your ideas into R code • Write functions to generate the plots you’re envisioning • Package: ggplot2 • Train and validate a machine learning model to use • Package: caret geom_hist_basic <- function(var){ ggplot(iris, aes_string(x = var)) + geom_histogram() + facet_wrap(~ Species) } predict_matrix(fit.knn, validation) Confusion Matrix and Statistics Prediction setosa versicolor virginica setosa 10 0 0 versicolor 0 8 1 virginica 0 2 9
  • 26. 2. Turn your R code into an R Shiny app Client Side Code: User Interface and Input Elements Server Side Code: (Reactive) R Output Elements shinyApp(ui = fluidPage, server = serverFunction) fluidPage Code serverFunction Code
  • 28. Let’s Build a REST API with R 1. Write Functions in R Expose Data or Model Produce Analysis or Visualization Data Agnostic Perform Analysis on New Data 2. Create Plumber API Endpoints - Get - Post 4. Send Requests to the Plumber Service Through external (or internal) Applications - Jupyter Notebooks - Web Apps 3. Host the Plumber Script on a Server - Create Plumber router object - Run in an R Session
  • 29. Docker Image RStudio Server R Session Running Plumber REST API My Local File System - Plumber.R - Dockerfile Local Volume Link Applications & Notebooks Requests! Demo Framework
  • 30. That’s it! Now go build some sweet data products
  • 32. R-Ladies Global Meetups • Get involved! • More female speakers, leaders, teachers, builders, friends! RLadies.org @RLadiesGlobal
  • 33. RStudio Webinars • All of the talks from RStudio::conf 2018 have just been published • Highly recommend!
  • 34. Resources for Learning Shiny Development shiny.rstudio.com
  • 35. Resources for Learning Plumber www.rplumber.io @TrestleJeff on Twitter!
  • 36. Note to self: Remember to give out stickers I have R-Ladies and R-Ladies Plumber Stickers! I’m Kelly! @kellrstats on Twitter