SlideShare a Scribd company logo
Backstage to Data Driven
Culture
Success with an
Agile Data Science
Stack
Big Data LA Day 2016
Pauline Chow
2
So, You are the First Data
Scientist…?
WORLDWIDE BUSINESS BUSINESS TO GO CREATIVE SOLUTIONS
WORLDWIDE BUSINESS BUSINESS TO GO CREATIVE SOLUTIONS
What my Friends Think I Do What my Mom Thinks I Do What Society Thinks I Do
What my Boss Think I Do What I Think I Do What I Actually Do
Misconceptions about Data Scientists
3
4
So, You are the First or Lead Data
Scientist…?
Open Source
& New Tools
Profits Steady ,
Adding Products
Report to VP
Marketing
Non Technical
Culture
First Data
Scientist
What does the organization do
best? How does it relate to
data and technology?
What is the business
core competencies?
What are existing tools,
processes, and code? Do you
have a budget for new tools and
resources?
What Tools are
Available ?
This is both a team members
and expectations related
question.
Where is your Team?
What is the mood of the
organization? How are they
solving problems? Why are they
adding DS/A into the
organization?
What is the State of
the Organization?
Who are the stakeholders?
How is data able to contribute
to their goals and
expectations?
Who has the
Influence On the
Roadmap?
Context for Presentation
Case Study: Startup in Digital Media
5
Effectively
Implement
Solutions
Maximize
Impact &
Commun-
ication
Set a Blueprint that
promotes flexibility,
iteration, and
scalability. It facilities
agile-oriented
mindsets for data
practices and it crucial
for implementation.
Build a Roadmap
from Blueprint to
shape data practices
and implement goals
from stakeholders,
company, as well as
strong DS/A
foundations.
Develop key
qualitative and
quantitative
milestones.
Communicate
consistently and
frequently to the
organization.
Influence
Expectations
Influence from both
angles, yours and
stakeholders
expectations. Find
explicit and implicit
goals and bridge the
gaps that you find.
6
Key Drivers Integrating Data Culture
Create an
Agile Data
Science
Stack
Non-technical focused
Actively
Listen
Implement
Explore Collaborate
Influence Grow
Guiding Verbs for “First” Data Scientist
7
In no particular order
ACTIVE LISTENING:
What Are you Trying to Hear?
Explicit Goals & Expectations
Structured, straight-forward, logical, and safe
inquiries
Document, share, and openly discuss with team
members and stakeholders.
Jungwoo Hong @ Unsplash
Implicit Goals & Expectations
Thom @ Unsplash
IMPLEMENT:
HOW TO APPROACH YOUR
BLUEPRINT FOR DATA
DRIVEN-INFORMED
CULTURE?
Architecture
First
Process
First
12
STACK AGILE APPROACHES
Anthony Delanoix @ Unsplash Jeff Sheldon @ Unsplash
Blueprint approach from infrastructure perspective
AGILE BY ARCHITECTURE
13
Customize as the team grows
SaaS & PaaS Integration
14
IDENTIFY
BUILD SYS &
MODELS
- Select Appropriate Models
- Build Models and Pipelines
for Scalability
- Evaluate and refine Models
ACQUIRE
DATA
- Identify the “right” source
- Import data and set up
remote / local storage
- Determine tools to work
with selected sources
CREATE PROBLEM
STATEMENT
- Identify business, data,
product objectives
- Brainstorm potential
solutions
- Create questions and
identify people/stakeholders
to help
PARSE & MINE DATA
- Determine distribution of
data and necessary
transformations
- Format, clean, splice, etc
- Create new derived data
PRESENT RESULTS
- Summarize Findings
- Add Storytelling aspects
- Identify next questions
and additional analysis
- For teams and
stakeholders
15
AGILE BY PROCESS
Blueprint approach from workflow perspective
ACQUIRE PARSE & MINE PRESENTBUILD DEPLOY
IDENTIFY
BUILD SYS &
MODELS + DEPLOY
Leverage platforms that document
models, pipelines, and feature
iterations. Collaboration is a plus.
-  Sklearn pipelines
-  DS/ML platforms: Yhat,
domino labs, anaconda
ACQUIRE DATA
Curate data from existing sources that
is cleaned, reliable, and automated,
where ETL can be skipped
-  Segement.io
-  Zapier
-  CrowdFlower
-  Open Data
CREATE PROBLEM
STATEMENT
Keep most attributes of
this section in-house and
within your team
PARSE & MINE DATA
For the data that cannot be
automated or acquired
cleanly, sklearn pipelines or
open source Luigi
(Spotify) or airflow
(AirBNB) can mitigate this
process.
PRESENT RESULTS
Adopt platforms that allow for
iterations and data mining/
parsing process to feed into
reports and presentations
-  Ipython Jupyter
Notebooks
-  Dashboards: Looker,
RJMetrics, Tableau
16
SaaS & PaaS Integration
Customize as the Process Increases in Complexity
ACQUIRE PARSE & MINE PRESENTBUILD DEPLOY
COLLABORATE:
What Metrics to Emphasize for
Teamwork?
Burn Rate
Most companies do not widely
broadcast but transparency can put
decisions into perspective for the
organization. Time and urgency can
also be of the essence.
Customer
Acquisition
Cost (CAC)
Illustrates market competitiveness
with your products, services, and
market saturation. Social media ad
platforms can make up a large portion
of these costs.
Gross
Profit &
Revenue
Actual revenue & profit after
expenses, investors, and
ongoing costs. If the business
model and product are viable
then the company will be able
to stand on its own without
external capital.
Active Users
Measure the ongoing stickiness
of a service or product. Clearly
define “active” to not
overcompensate first-time, new,
and experimental users. Can
the company move beyond
early adopters and fans?
Churn Rate &
Retention
How many people are leaving or
become inactive after a certain
period of time? When in the
customer’s lifetime is churn more
likely to occur? The higher the
expected churn rate, then the
more the company has to spend
on acquiring new customers.
Cumulative
Growth
Cumulative growth puts a long
term and sustainable
perspective to just month over
month growth. Short-term
growth can unabashedly take
over and cause decision
makers to lose sight of an
organization’s mission and
goals.
Response
Time
The amount of time teams take
to respond and complete tasks,
which includes bug fixes,
technological improvements,
product upgades, and customer
service. Responsiveness
demonstrates staff and team
dedication, effective allocation of
resources, operational
effectiveness, and no tech debt.
Customer
LIfetime
Value (CLV)
Total dollars from a customer
during the lifetime relationship
with that customer. Intersection
of frequency of customer
purchases, revenue per
customer, acquisition costs.
This measure can have
predictive qualities
INFLUENCE
How to align and connect
goals and expectations?
"Leadership is the art of giving people
a platform for spreading ideas that
work."
-Seth Godin
23
Evaluate milestones,
iterate and grow
Month 12
Blueprint for Agile
Data Science and
Analytics Stack
Day 30
Establish clear
measures for success
as widespread as
possible
Day 90
Good first
impressions. Listen
and Learn!
Day 1
Celebrate improvements
to workflow,
effectiveness, and
access
Day 60
Democratize data
access and streamline
measures to external
and internal teams
Month 6
Communicate, Strategize, Communicate...
Connect the Dots
24
Anything Else Reporting &
Urgent
Requests
Data
Acquisition,
Cleaning
Exploration &
Analysis,
Reports, &
Presentation
20% 80% 80% 20%
25
Allocate Time & Resources Effectively
Business as Usual Allocation New Data Science Allocation
GROW YOUR TEAM
When to increase the ability and
capabilities of your team?
Technical Project
Manager
Data Scientist
Data Engineer
Data Engineer
Analyst
Researcher
Team Members
6
1
2
5Central to the ability to
juggle and balance
responsibility of being the
first/lead data scientist.
Agile Data Science
& Analytics Stack
3
4
Active
Listeni
ng
Influen
ce
Collabora
te with
Metrics
Explore
Implement
Grow
Actionable Agile DS/A Stack is Key to
Success
28
@DataThinker
WhenThereIsData.com
pauline.chow@gmail.com

More Related Content

What's hot (20)

PDF
The Data Greenhouse DevOps Measurement at Scale
sparkagility
 
PDF
Data Science or Do you believe in magic?
Tereza Iofciu
 
PDF
From Crowdsourcing to Crowd Making: The Path From Ideas to Solutions
Seattle Interactive Conference
 
PDF
Beyond the Retrospective: Embracing Complexity on the Road to Service Ownership
J. Paul Reed
 
PDF
How Companies can Effectively Work with Open Source Communities
All Things Open
 
PDF
10 Atlassian Tool Hacks to Improve Team Culture
Atlassian
 
PDF
The Team Playbook: A Recipe for Healthy Teams
Atlassian
 
PPTX
conf2015_BusinessPracticePreso_092215_post
Anne-Marie "Punky" Chun
 
PDF
Running Effective Controlled Experiments (aka A/B/n Tests) - Data Science Pop...
Domino Data Lab
 
PDF
Hiring for Data Scientists - Data Science Pop-up Seattle
Domino Data Lab
 
PDF
Pdf analytics-and-witch-doctoring -why-executives-succumb-to-the-black-box-me...
OrateTeam
 
PPTX
Overcoming Top 5 Misconceptions Predictive Analytics
Sai Kumar Devulapalli
 
PDF
What's the Value of Data Science for Organizations: Tips for Invincibility in...
Ganes Kesari
 
PDF
Atlassian Overview
Atlassian
 
PDF
Agile digital enterprise framework v1.4
Pierre E. NEIS
 
PDF
Graham Thomas - The Testers Toolbox - EuroSTAR 2010
TEST Huddle
 
PPTX
Michael Plante, Inside Sales: The AI Revolution
W2O Group
 
PDF
D. Aitcheson. How to make forecasts that are actually accurate.
Agile Lietuva
 
PPTX
Giovanni Lanzani GoDataDriven
BigDataExpo
 
PDF
10 Online Tools for Busy Nonprofits
TechSoup Canada
 
The Data Greenhouse DevOps Measurement at Scale
sparkagility
 
Data Science or Do you believe in magic?
Tereza Iofciu
 
From Crowdsourcing to Crowd Making: The Path From Ideas to Solutions
Seattle Interactive Conference
 
Beyond the Retrospective: Embracing Complexity on the Road to Service Ownership
J. Paul Reed
 
How Companies can Effectively Work with Open Source Communities
All Things Open
 
10 Atlassian Tool Hacks to Improve Team Culture
Atlassian
 
The Team Playbook: A Recipe for Healthy Teams
Atlassian
 
conf2015_BusinessPracticePreso_092215_post
Anne-Marie "Punky" Chun
 
Running Effective Controlled Experiments (aka A/B/n Tests) - Data Science Pop...
Domino Data Lab
 
Hiring for Data Scientists - Data Science Pop-up Seattle
Domino Data Lab
 
Pdf analytics-and-witch-doctoring -why-executives-succumb-to-the-black-box-me...
OrateTeam
 
Overcoming Top 5 Misconceptions Predictive Analytics
Sai Kumar Devulapalli
 
What's the Value of Data Science for Organizations: Tips for Invincibility in...
Ganes Kesari
 
Atlassian Overview
Atlassian
 
Agile digital enterprise framework v1.4
Pierre E. NEIS
 
Graham Thomas - The Testers Toolbox - EuroSTAR 2010
TEST Huddle
 
Michael Plante, Inside Sales: The AI Revolution
W2O Group
 
D. Aitcheson. How to make forecasts that are actually accurate.
Agile Lietuva
 
Giovanni Lanzani GoDataDriven
BigDataExpo
 
10 Online Tools for Busy Nonprofits
TechSoup Canada
 

Viewers also liked (13)

PPTX
Capturing the Mirage: Machine Learning in Media and Entertainment Industries
Domino Data Lab
 
PPTX
Data Science and Goodhart's Law
Domino Data Lab
 
PPTX
A Tour of the Data Science Process, a Case Study Using Movie Industry Data
Domino Data Lab
 
PPTX
Data Scientists Are Analysts Are Also Software Engineers
Domino Data Lab
 
PDF
Computable content: Notebooks, containers, and data-centric organizational le...
Domino Data Lab
 
PDF
No-Bullshit Data Science
Domino Data Lab
 
PPTX
ThinkFast: Scaling Machine Learning to Modern Demands
Domino Data Lab
 
PPTX
Sentiment Analysis of Film-Related Messages on Social Media
Domino Data Lab
 
PPTX
Open Data for Social Good
Domino Data Lab
 
PPTX
Machine Learning at Netflix
Domino Data Lab
 
PPTX
Challenges of Predicting User Engagement
Domino Data Lab
 
PDF
Big Data LA 2016: Backstage to a Data Driven Culture
Pauline Chow
 
PPTX
Big Data = MISSION IMPOSSIBLE?
Bruno Aziza
 
Capturing the Mirage: Machine Learning in Media and Entertainment Industries
Domino Data Lab
 
Data Science and Goodhart's Law
Domino Data Lab
 
A Tour of the Data Science Process, a Case Study Using Movie Industry Data
Domino Data Lab
 
Data Scientists Are Analysts Are Also Software Engineers
Domino Data Lab
 
Computable content: Notebooks, containers, and data-centric organizational le...
Domino Data Lab
 
No-Bullshit Data Science
Domino Data Lab
 
ThinkFast: Scaling Machine Learning to Modern Demands
Domino Data Lab
 
Sentiment Analysis of Film-Related Messages on Social Media
Domino Data Lab
 
Open Data for Social Good
Domino Data Lab
 
Machine Learning at Netflix
Domino Data Lab
 
Challenges of Predicting User Engagement
Domino Data Lab
 
Big Data LA 2016: Backstage to a Data Driven Culture
Pauline Chow
 
Big Data = MISSION IMPOSSIBLE?
Bruno Aziza
 
Ad

Similar to Success Through an Actionable Data Science Stack (20)

PPTX
DataOps: Nine steps to transform your data science impact Strata London May 18
Harvinder Atwal
 
PDF
How to succeed at data without even trying!
Dylan
 
PDF
There’s data everywhere! - Simo Ahava
Web à Québec
 
PDF
Training Taster: Leading the way to become a data-driven organization
GoDataDriven
 
PPTX
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
Harvinder Atwal
 
PPT
5 Essential Practices for the Data Driven Organization
Vivastream
 
PDF
Georgetown B-school Talk 2021
Charles Martin
 
PPTX
Big data and Marketing by Edward Chenard
Edward Chenard
 
PPTX
#MarketingShake - Edward Chenard - Descubrí el poder del Big Data para Transf...
amdia
 
PDF
Marcus Baker: People Analytics at Scale
Edunomica
 
PPSX
5 Essential Practices for the Data Driven Organization
Vivastream
 
PPTX
Are you ready for Data science? A 12 point test
Bertil Hatt
 
PDF
What makes an effective data team?
Snowplow Analytics
 
PDF
Analytics ID Consulting Services.pdf
Ricky Utomo
 
PPTX
ETE 2013: Going Big with Big Data...one step at a time
Anita Andrews
 
PDF
Data-Driven is Passé: Transform Into An Insights-Driven Enterprise
Denodo
 
PDF
Data Driven Agility - Transforming Journey with Data
AgileWoW Consulting Services Pvt Ltd
 
PPTX
How to use your data science team: Becoming a data-driven organization
Yael Garten
 
PPTX
Agile Data Science
Alexander Bauer
 
PPTX
What MBA Students Need to Know about CX, Data Science and Surveys
Business Over Broadway
 
DataOps: Nine steps to transform your data science impact Strata London May 18
Harvinder Atwal
 
How to succeed at data without even trying!
Dylan
 
There’s data everywhere! - Simo Ahava
Web à Québec
 
Training Taster: Leading the way to become a data-driven organization
GoDataDriven
 
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
Harvinder Atwal
 
5 Essential Practices for the Data Driven Organization
Vivastream
 
Georgetown B-school Talk 2021
Charles Martin
 
Big data and Marketing by Edward Chenard
Edward Chenard
 
#MarketingShake - Edward Chenard - Descubrí el poder del Big Data para Transf...
amdia
 
Marcus Baker: People Analytics at Scale
Edunomica
 
5 Essential Practices for the Data Driven Organization
Vivastream
 
Are you ready for Data science? A 12 point test
Bertil Hatt
 
What makes an effective data team?
Snowplow Analytics
 
Analytics ID Consulting Services.pdf
Ricky Utomo
 
ETE 2013: Going Big with Big Data...one step at a time
Anita Andrews
 
Data-Driven is Passé: Transform Into An Insights-Driven Enterprise
Denodo
 
Data Driven Agility - Transforming Journey with Data
AgileWoW Consulting Services Pvt Ltd
 
How to use your data science team: Becoming a data-driven organization
Yael Garten
 
Agile Data Science
Alexander Bauer
 
What MBA Students Need to Know about CX, Data Science and Surveys
Business Over Broadway
 
Ad

More from Domino Data Lab (20)

PDF
What's in your workflow? Bringing data science workflows to business analysis...
Domino Data Lab
 
PDF
The Proliferation of New Database Technologies and Implications for Data Scie...
Domino Data Lab
 
PDF
Racial Bias in Policing: an analysis of Illinois traffic stops data
Domino Data Lab
 
PPTX
Data Quality Analytics: Understanding what is in your data, before using it
Domino Data Lab
 
PPTX
Supporting innovation in insurance with randomized experimentation
Domino Data Lab
 
PPTX
Leveraging Data Science in the Automotive Industry
Domino Data Lab
 
PDF
Summertime Analytics: Predicting E. coli and West Nile Virus
Domino Data Lab
 
PPTX
Reproducible Dashboards and other great things to do with Jupyter
Domino Data Lab
 
PDF
GeoViz: A Canvas for Data Science
Domino Data Lab
 
PPTX
Managing Data Science | Lessons from the Field
Domino Data Lab
 
PDF
Doing your first Kaggle (Python for Big Data sets)
Domino Data Lab
 
PDF
Leveraged Analytics at Scale
Domino Data Lab
 
PDF
How I Learned to Stop Worrying and Love Linked Data
Domino Data Lab
 
PDF
Software Engineering for Data Scientists
Domino Data Lab
 
PDF
Making Big Data Smart
Domino Data Lab
 
PPTX
Moving Data Science from an Event to A Program: Considerations in Creating Su...
Domino Data Lab
 
PPTX
Building Data Analytics pipelines in the cloud using serverless technology
Domino Data Lab
 
PPTX
Leveraging Open Source Automated Data Science Tools
Domino Data Lab
 
PPTX
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino Data Lab
 
PDF
The Role and Importance of Curiosity in Data Science
Domino Data Lab
 
What's in your workflow? Bringing data science workflows to business analysis...
Domino Data Lab
 
The Proliferation of New Database Technologies and Implications for Data Scie...
Domino Data Lab
 
Racial Bias in Policing: an analysis of Illinois traffic stops data
Domino Data Lab
 
Data Quality Analytics: Understanding what is in your data, before using it
Domino Data Lab
 
Supporting innovation in insurance with randomized experimentation
Domino Data Lab
 
Leveraging Data Science in the Automotive Industry
Domino Data Lab
 
Summertime Analytics: Predicting E. coli and West Nile Virus
Domino Data Lab
 
Reproducible Dashboards and other great things to do with Jupyter
Domino Data Lab
 
GeoViz: A Canvas for Data Science
Domino Data Lab
 
Managing Data Science | Lessons from the Field
Domino Data Lab
 
Doing your first Kaggle (Python for Big Data sets)
Domino Data Lab
 
Leveraged Analytics at Scale
Domino Data Lab
 
How I Learned to Stop Worrying and Love Linked Data
Domino Data Lab
 
Software Engineering for Data Scientists
Domino Data Lab
 
Making Big Data Smart
Domino Data Lab
 
Moving Data Science from an Event to A Program: Considerations in Creating Su...
Domino Data Lab
 
Building Data Analytics pipelines in the cloud using serverless technology
Domino Data Lab
 
Leveraging Open Source Automated Data Science Tools
Domino Data Lab
 
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino Data Lab
 
The Role and Importance of Curiosity in Data Science
Domino Data Lab
 

Recently uploaded (20)

PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PDF
July Patch Tuesday
Ivanti
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
Python basic programing language for automation
DanialHabibi2
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
July Patch Tuesday
Ivanti
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Python basic programing language for automation
DanialHabibi2
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 

Success Through an Actionable Data Science Stack

  • 1. Backstage to Data Driven Culture Success with an Agile Data Science Stack Big Data LA Day 2016 Pauline Chow
  • 2. 2 So, You are the First Data Scientist…?
  • 3. WORLDWIDE BUSINESS BUSINESS TO GO CREATIVE SOLUTIONS WORLDWIDE BUSINESS BUSINESS TO GO CREATIVE SOLUTIONS What my Friends Think I Do What my Mom Thinks I Do What Society Thinks I Do What my Boss Think I Do What I Think I Do What I Actually Do Misconceptions about Data Scientists 3
  • 4. 4 So, You are the First or Lead Data Scientist…?
  • 5. Open Source & New Tools Profits Steady , Adding Products Report to VP Marketing Non Technical Culture First Data Scientist What does the organization do best? How does it relate to data and technology? What is the business core competencies? What are existing tools, processes, and code? Do you have a budget for new tools and resources? What Tools are Available ? This is both a team members and expectations related question. Where is your Team? What is the mood of the organization? How are they solving problems? Why are they adding DS/A into the organization? What is the State of the Organization? Who are the stakeholders? How is data able to contribute to their goals and expectations? Who has the Influence On the Roadmap? Context for Presentation Case Study: Startup in Digital Media 5
  • 6. Effectively Implement Solutions Maximize Impact & Commun- ication Set a Blueprint that promotes flexibility, iteration, and scalability. It facilities agile-oriented mindsets for data practices and it crucial for implementation. Build a Roadmap from Blueprint to shape data practices and implement goals from stakeholders, company, as well as strong DS/A foundations. Develop key qualitative and quantitative milestones. Communicate consistently and frequently to the organization. Influence Expectations Influence from both angles, yours and stakeholders expectations. Find explicit and implicit goals and bridge the gaps that you find. 6 Key Drivers Integrating Data Culture Create an Agile Data Science Stack Non-technical focused
  • 7. Actively Listen Implement Explore Collaborate Influence Grow Guiding Verbs for “First” Data Scientist 7 In no particular order
  • 8. ACTIVE LISTENING: What Are you Trying to Hear?
  • 9. Explicit Goals & Expectations Structured, straight-forward, logical, and safe inquiries Document, share, and openly discuss with team members and stakeholders. Jungwoo Hong @ Unsplash
  • 10. Implicit Goals & Expectations Thom @ Unsplash
  • 11. IMPLEMENT: HOW TO APPROACH YOUR BLUEPRINT FOR DATA DRIVEN-INFORMED CULTURE?
  • 12. Architecture First Process First 12 STACK AGILE APPROACHES Anthony Delanoix @ Unsplash Jeff Sheldon @ Unsplash
  • 13. Blueprint approach from infrastructure perspective AGILE BY ARCHITECTURE 13
  • 14. Customize as the team grows SaaS & PaaS Integration 14
  • 15. IDENTIFY BUILD SYS & MODELS - Select Appropriate Models - Build Models and Pipelines for Scalability - Evaluate and refine Models ACQUIRE DATA - Identify the “right” source - Import data and set up remote / local storage - Determine tools to work with selected sources CREATE PROBLEM STATEMENT - Identify business, data, product objectives - Brainstorm potential solutions - Create questions and identify people/stakeholders to help PARSE & MINE DATA - Determine distribution of data and necessary transformations - Format, clean, splice, etc - Create new derived data PRESENT RESULTS - Summarize Findings - Add Storytelling aspects - Identify next questions and additional analysis - For teams and stakeholders 15 AGILE BY PROCESS Blueprint approach from workflow perspective ACQUIRE PARSE & MINE PRESENTBUILD DEPLOY
  • 16. IDENTIFY BUILD SYS & MODELS + DEPLOY Leverage platforms that document models, pipelines, and feature iterations. Collaboration is a plus. -  Sklearn pipelines -  DS/ML platforms: Yhat, domino labs, anaconda ACQUIRE DATA Curate data from existing sources that is cleaned, reliable, and automated, where ETL can be skipped -  Segement.io -  Zapier -  CrowdFlower -  Open Data CREATE PROBLEM STATEMENT Keep most attributes of this section in-house and within your team PARSE & MINE DATA For the data that cannot be automated or acquired cleanly, sklearn pipelines or open source Luigi (Spotify) or airflow (AirBNB) can mitigate this process. PRESENT RESULTS Adopt platforms that allow for iterations and data mining/ parsing process to feed into reports and presentations -  Ipython Jupyter Notebooks -  Dashboards: Looker, RJMetrics, Tableau 16 SaaS & PaaS Integration Customize as the Process Increases in Complexity ACQUIRE PARSE & MINE PRESENTBUILD DEPLOY
  • 17. COLLABORATE: What Metrics to Emphasize for Teamwork?
  • 18. Burn Rate Most companies do not widely broadcast but transparency can put decisions into perspective for the organization. Time and urgency can also be of the essence. Customer Acquisition Cost (CAC) Illustrates market competitiveness with your products, services, and market saturation. Social media ad platforms can make up a large portion of these costs.
  • 19. Gross Profit & Revenue Actual revenue & profit after expenses, investors, and ongoing costs. If the business model and product are viable then the company will be able to stand on its own without external capital. Active Users Measure the ongoing stickiness of a service or product. Clearly define “active” to not overcompensate first-time, new, and experimental users. Can the company move beyond early adopters and fans?
  • 20. Churn Rate & Retention How many people are leaving or become inactive after a certain period of time? When in the customer’s lifetime is churn more likely to occur? The higher the expected churn rate, then the more the company has to spend on acquiring new customers. Cumulative Growth Cumulative growth puts a long term and sustainable perspective to just month over month growth. Short-term growth can unabashedly take over and cause decision makers to lose sight of an organization’s mission and goals.
  • 21. Response Time The amount of time teams take to respond and complete tasks, which includes bug fixes, technological improvements, product upgades, and customer service. Responsiveness demonstrates staff and team dedication, effective allocation of resources, operational effectiveness, and no tech debt. Customer LIfetime Value (CLV) Total dollars from a customer during the lifetime relationship with that customer. Intersection of frequency of customer purchases, revenue per customer, acquisition costs. This measure can have predictive qualities
  • 22. INFLUENCE How to align and connect goals and expectations?
  • 23. "Leadership is the art of giving people a platform for spreading ideas that work." -Seth Godin 23
  • 24. Evaluate milestones, iterate and grow Month 12 Blueprint for Agile Data Science and Analytics Stack Day 30 Establish clear measures for success as widespread as possible Day 90 Good first impressions. Listen and Learn! Day 1 Celebrate improvements to workflow, effectiveness, and access Day 60 Democratize data access and streamline measures to external and internal teams Month 6 Communicate, Strategize, Communicate... Connect the Dots 24
  • 25. Anything Else Reporting & Urgent Requests Data Acquisition, Cleaning Exploration & Analysis, Reports, & Presentation 20% 80% 80% 20% 25 Allocate Time & Resources Effectively Business as Usual Allocation New Data Science Allocation
  • 26. GROW YOUR TEAM When to increase the ability and capabilities of your team?
  • 27. Technical Project Manager Data Scientist Data Engineer Data Engineer Analyst Researcher Team Members
  • 28. 6 1 2 5Central to the ability to juggle and balance responsibility of being the first/lead data scientist. Agile Data Science & Analytics Stack 3 4 Active Listeni ng Influen ce Collabora te with Metrics Explore Implement Grow Actionable Agile DS/A Stack is Key to Success 28