Data Science
Demystified
Emily Robinson
What is Data Science?
A common question
What is data science?
What do data scientists do?
What is Data Science? The reality
https://hackernoon.com/the-ai-hierarchy-of-needs-18f111fcc007
AI needs a lot of support
What is a data scientists?
What do data scientists do? The Reality
What do data scientists do? The Reality
One definition
https://hackernoon.com/what-on-earth-is-data-science-eb1237d8cb37, Cassie Kozyrkov
Data science is the
discipline of
making data useful
Classic data science venn diagram
http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
Our (slightly updated) version
Programming
OR
Benefit of programming
Accessibility
Web APIs
SQL Databases
Historical Data
httr
DBI
dbply
r
Slide from https://www.youtube.com/watch?v=KreEV7p6dnU by Chris Cardillo
Code away repetitive
tasks
Code around obstacles
Limit Human Error
Benefit of programming
Efficiency
Slide from https://www.youtube.com/watch?v=KreEV7p6dnU by Chris Cardillo
Benefit of programming
Collaboration
Increased Shareability
Communicable Processes
Dependable Replicability
Slide from https://www.youtube.com/watch?v=KreEV7p6dnU by Chris Cardillo
Code away repetitive
tasks
Code around obstacles
Limit Human Error
Benefits of programming
Accessibility
Efficiency
Collaboration
Web APIs
SQL Databases
Historical Data
Increased Shareability
Communicable
Processes
Dependable Replicability
httr
DBI
dbply
r
Slide from https://www.youtube.com/watch?v=KreEV7p6dnU by Chris Cardillo
Mathematics & statistics
1. What techniques exists
• I need to group customers together -> I should try clustering
2. How to apply them
• How to do a k-means clustering in R/Python
3. How to choose which to try
• What clustering method will work best?
Statistics: going beyond the numbers
Which is greater?
4 out of 10 390 out of 1000
Statistics: going beyond the numbers
Who are the worst batters?
http://varianceexplained.org/r/credible_intervals_baseball/
Who are the best batters?
Statistics: going beyond the numbers
http://varianceexplained.org/r/credible_intervals_baseball/
How can we
split our
customers into
different groups
to market to?
How can we run
a clustering
algorithm to
segment
customer data?
Business
question
Data science
question
A k-means
clustering found
3 distinct
groups
Data science
answer
Business answer
Here are 3 types
of customers:
new, high
spending,
commercial
Domain knowledge
- Renee Teate, @BecomingDataSci
Skills:
• Communication
• Empathy
• Understanding your data (where it lives, built-in assumptions, edge cases)
Three sub-categories
https://www.linkedin.com/pulse/one-data-science-job-doesnt-fit-all-elena-grewal/
Analytics
Pulled from Airbnb Careers
• Define and evaluate
key metrics
• Develop dashboards
• Communicate
analyses
• Comfortable in SQL
• Industry experience
Algorithms
From Airbnb Careers
• Deep Learning
techniques
• Natural language
processing
• Strong programming
skills
• Developing ML
models at scale in
Inference
From Airbnb Careers
• Run strategic
analysis
• Design experiments
• Improve statistical
methodology
• PhD in quantitative
field
Three completely different job descriptions
From Airbnb Careers
• Deep Learning
techniques
• Natural language
processing
• Strong programming
skills
• Developing ML
models at scale in
• Define and evaluate
key metrics
• Develop dashboards
• Communicate
analyses
• Comfortable in SQL
• Industry experience
• Run strategic
analysis
• Design experiments
• Improve statistical
methodology
• PhD in quantitative
field
How Do I Grow my Data Science
Skills?
Become a data scientist in 3 easy steps
https://towardsdatascience.com/from-data-analyst-to-data-scientist-f67a724ea265,
Ben Stanbury
“Must know” lists
You don’t need to know everything
#1 Advice
Practice by
making data
science personal
projects
Why?
“Don’t get stressed about keeping up with the
cutting edge of the field … You should start by
getting very comfortable transforming and
visualizing data, programming with a wide variety
of packages, and using statistical techniques like
hypothesis tests, classification, and regression.”
- David Robinson, Data Insights Engineering Manager at Flatiron Health, Chapter 4
How?
Dataset -> Question
Dataset -> Question
Question -> Dataset
https://theambitiouseconomist.com/an-analysis-of-the-gender-wage-gap-in-australia/
Tip 1: Include visualizations
https://hackernoon.com/more-than-a-million-pro-repeal-net-neutrality-comments-were-likely-faked-e9f0e3ed36a6
Tip 2: choose a topic you’re excited about
https://masalmon.eu/2018/01/01/sortinghat/
Tip 3: Limit your scope
https://kkulma.github.io/2017-08-13-friendships-among-top-r-twitterers/
Making progress
Inspired by bit.ly/drob-rstudio-2019
Less valuable More valuable
Idea Getting data Cleaning Exploratory Final resultModeling
Less valuable More valuable
Work only on
your computer
Work online
(GitHub, Blog, Kaggle)
How I used to think about analyses
How I think about analyses now
The full process
Put it on GitHub
Conclusion
The potential future of data scientists
From https://hbr.org/2018/08/what-data-scientists-really-do-according-to-35-data-scientists
“It wouldn’t surprise me if
the [data scientist] title
goes the way of the
‘webmaster’”
- Hilary Mason
Resources
• Day in the life of a data scientist webinar by David Robinson
• You’re not paid to model by Jacqueline Nolis
• Doing data science at Twitter by Robert Chang
• Succeeding as a data scientist in small companies/startups by Randy Au
• How to change careers and become a data scientist – one quant’s experience
by Rachel Thomas
• What data scientists really do, according to 35 data scientists by Hugo Bowne-
Anderson
Getting started with data science
Thank you!
hookedondata.org
@robinson_es
datascicareer.com
40% off w/ pcrobinson

More Related Content

PDF
Visualising Data with Code
PDF
Life of a data scientist (pub)
PDF
Introduction to Data Science (Data Science Thailand Meetup #1)
PDF
How to Become a Data Scientist
PPTX
How To Become a Data Scientist in Iran Marketplace
PDF
Introduction to Data Science
PDF
Data science presentation 2nd CI day
PDF
Demystifying Data Science with an introduction to Machine Learning
Visualising Data with Code
Life of a data scientist (pub)
Introduction to Data Science (Data Science Thailand Meetup #1)
How to Become a Data Scientist
How To Become a Data Scientist in Iran Marketplace
Introduction to Data Science
Data science presentation 2nd CI day
Demystifying Data Science with an introduction to Machine Learning

What's hot (20)

PDF
Data science presentation
PDF
Introduction to Data Science
PDF
Myths and Mathemagical Superpowers of Data Scientists
PDF
Is Data Scientist still the sexiest job of 21st century? Find Out!
PDF
How to Identify, Train or Become a Data Scientist
PPTX
Data Science presentation for elementary school students
PDF
Drawing Your career in business analytics and data science
PDF
Big Data Analytics and Data Science
PDF
Introduction to Data Science and Analytics
PPTX
Data Science: Not Just For Big Data
PDF
Data Science Project Lifecycle
PPTX
Introduction to Big Data/Machine Learning
PDF
Unit 3 part 2
PDF
Decoding Data Science
PPTX
Big Data and HR - Talk @SwissHR Congress
PDF
Data science and_analytics_for_ordinary_people_ebook
PDF
Using hadoop for big data
PPTX
Data Science Project Lifecycle and Skill Set
PDF
Introduction to Data Science and Large-scale Machine Learning
PDF
Machine Learning Essentials (dsth Meetup#3)
Data science presentation
Introduction to Data Science
Myths and Mathemagical Superpowers of Data Scientists
Is Data Scientist still the sexiest job of 21st century? Find Out!
How to Identify, Train or Become a Data Scientist
Data Science presentation for elementary school students
Drawing Your career in business analytics and data science
Big Data Analytics and Data Science
Introduction to Data Science and Analytics
Data Science: Not Just For Big Data
Data Science Project Lifecycle
Introduction to Big Data/Machine Learning
Unit 3 part 2
Decoding Data Science
Big Data and HR - Talk @SwissHR Congress
Data science and_analytics_for_ordinary_people_ebook
Using hadoop for big data
Data Science Project Lifecycle and Skill Set
Introduction to Data Science and Large-scale Machine Learning
Machine Learning Essentials (dsth Meetup#3)
Ad

Similar to Data Science Demystified (20)

PDF
Training in Analytics and Data Science
PPTX
What is Data Science? Introduction to data science and data science career
PDF
The Role of Data Wrangling in Driving Hadoop Adoption
PDF
SPT 104 Unlock your big data with analytics and BI on Office 365
PDF
SPS Utah 2016 - Unlock your big data with analytics and BI on Office 365
PPTX
KDD 2019 IADSS Workshop - Research Updates from Usama Fayyad & Hamit Hamutcu
PDF
PPTX
Data science in business Administration Nagarajan.pptx
PDF
Thinkful DC - Intro to Data Science
PPTX
Ch1IntroductiontoDataScience.pptx
PPTX
Accelerating Data Lakes and Streams with Real-time Analytics
PPTX
Predictive Analytics - it's not just for scientists
PDF
How to Prepare for a Career in Data Science
PDF
Data sci sd-11.6.17
PDF
From Rocket Science to Data Science
DOCX
My Journey from Data Confusion to Data Mastery.docx
PDF
The Research Blueprint: Excelling in Data science, Data Analysis and AI
PPTX
Data Science Workshop - day 1
PPTX
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
PDF
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Training in Analytics and Data Science
What is Data Science? Introduction to data science and data science career
The Role of Data Wrangling in Driving Hadoop Adoption
SPT 104 Unlock your big data with analytics and BI on Office 365
SPS Utah 2016 - Unlock your big data with analytics and BI on Office 365
KDD 2019 IADSS Workshop - Research Updates from Usama Fayyad & Hamit Hamutcu
Data science in business Administration Nagarajan.pptx
Thinkful DC - Intro to Data Science
Ch1IntroductiontoDataScience.pptx
Accelerating Data Lakes and Streams with Real-time Analytics
Predictive Analytics - it's not just for scientists
How to Prepare for a Career in Data Science
Data sci sd-11.6.17
From Rocket Science to Data Science
My Journey from Data Confusion to Data Mastery.docx
The Research Blueprint: Excelling in Data science, Data Analysis and AI
Data Science Workshop - day 1
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Ad

More from Emily Robinson (6)

PPTX
6 Guidelines for A/B Testing
PPTX
Everything you wanted to know about making an R package but were afraid to ask
PPTX
Building an A/B Testing Analytics System with R and Shiny
PPTX
10 Guidelines for A/B Testing
PPTX
NY R Conference talk
PPTX
The Lesser Known Stars of the Tidyverse
6 Guidelines for A/B Testing
Everything you wanted to know about making an R package but were afraid to ask
Building an A/B Testing Analytics System with R and Shiny
10 Guidelines for A/B Testing
NY R Conference talk
The Lesser Known Stars of the Tidyverse

Recently uploaded (20)

PDF
Comparative analysis of machine learning models for fake news detection in so...
PPTX
Build Your First AI Agent with UiPath.pptx
PPTX
TEXTILE technology diploma scope and career opportunities
PDF
UiPath Agentic Automation session 1: RPA to Agents
PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
PDF
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
DOCX
search engine optimization ppt fir known well about this
PDF
Credit Without Borders: AI and Financial Inclusion in Bangladesh
PPT
Geologic Time for studying geology for geologist
PPTX
Configure Apache Mutual Authentication
PPT
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
PDF
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
PDF
A review of recent deep learning applications in wood surface defect identifi...
PPTX
Training Program for knowledge in solar cell and solar industry
PPTX
Module 1 Introduction to Web Programming .pptx
PDF
4 layer Arch & Reference Arch of IoT.pdf
PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
PDF
Consumable AI The What, Why & How for Small Teams.pdf
Comparative analysis of machine learning models for fake news detection in so...
Build Your First AI Agent with UiPath.pptx
TEXTILE technology diploma scope and career opportunities
UiPath Agentic Automation session 1: RPA to Agents
Convolutional neural network based encoder-decoder for efficient real-time ob...
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
search engine optimization ppt fir known well about this
Credit Without Borders: AI and Financial Inclusion in Bangladesh
Geologic Time for studying geology for geologist
Configure Apache Mutual Authentication
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
A review of recent deep learning applications in wood surface defect identifi...
Training Program for knowledge in solar cell and solar industry
Module 1 Introduction to Web Programming .pptx
4 layer Arch & Reference Arch of IoT.pdf
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
Consumable AI The What, Why & How for Small Teams.pdf

Data Science Demystified

Editor's Notes