SlideShare a Scribd company logo
Big Data and the
Art of Data Science
Andrew B. Gardner, PhD
www.linkedin.com/in/andywocky/
agardner@momentics.com
www.momentics.com
Big Data is Not New
Big Data Challenge
tion
e
old
8
1880 census – 50M people
The First Big Data Solution
• Hollerith Tabulating
System
• Punched cards – 80
variables
• Used for 1890 census
• 6 weeks instead of 7+
years
9
Hollerith Tabulation System
{age, number of insanes, …} 7 years  6 weeks
Image Credit – http://en.wikipedia.org/wiki/File:1880_census_Edison.gif
Image Credit – http://en.wikipedia.org/wiki/File:Hollerith_Punched_Card.jpg
Image Credit – http://en.wikipedia.org/wiki/File:HollerithMachine.CHM.jpg
Big Data Is More Than 3 Vs*
Volume Variety Velocity
*2001 (Meta) / 2012 (Gartner) Definition of Big Data
IDC Report 2011
8 billion TB in 2015
40 billion TB in 2020
90% of all data < 2 years
storage  transport
processing
relational, graph
time series, sensor,
audio, video, text,
geo, scientific, …
80% unstructured
facebook 500 TB/day
Large Hadron 35 GB/sec
twitter 300K tweets/min
real time  stream
Big Data Opportunities
“… big data market will grow from $3.2B (2010) to $16.9B (2015)…”
“… gains of 5-6% productivity and profitability …”
“… business volume will double every 1.2 years …”
“… required for companies to stay innovative and competitive …”
“… retail 60% increase in net margin attainable …”
“… manufacturing production costs decrease 50% …”
“… $300B annual savings in healthcare …”
IBM | The Economist | McKinsey & Company | PWC | KPMG | Accenture
Big Data Successes
Walmart
• 10-15% online sales lift
• $1B incremental revenue
• Recommendations
• Engineered content
• 2012 Presidential Election • Fleet telematics save fuel
What’s Going On?
1: Growth of Data
Amount of data in the world…
2005
100 EB
2012
2800 EB
2013
8000 EB
1 EB = 1 Exabyte = 1 billion GB
… doubles every 2 years
2: Connectedness & Sources
More non-human
nodes online than
people
50B+ non-human
nodes online
The Internet of Things (IoT)
Source: Swan, M. Sensor Mania! The Internet of Things, Objective
Metrics, and the Quantified Self 2.0. J Sens Actuator Netw (2012) 1(3),
217-253.
social
mobile
web
enriched data
science
IoT
Data Sources
3: Demand
Increasing dependence on data.
4: Economics
Attention economy not information economy!
• Data is bountiful
• Storage is cheap
• Computing is cheap
• Analysis is cheap
• Talent is expensive
• Time is expensive
Big Data Disruption
• define schema
• pour in data
• analyze
Better Cycle Times and Better Questions Win!
 (few) well calculated
questions first
• collect data
• explore
• schema as needed
 data first then
exploratory decision
making
unknown unknowns = insight gold
OLD NEW
Rumsfeld Analytics
Things we
know
don’t know
we know
we don’t
know
we know
we don’t
know
Facts – could be wrong.
Questions – do reporting.
Intuition – quantify to improve.
Exploration– unfair advantages.
Goal: data discoveries = insights = game changers = unknown unknowns.
Data Alone is Just An Asset
• Depreciating
• Liability
• Useful lifetime
• Expense
Finished goods create value
from raw materials
data
$$ data product $$
Enter the Data Scientist
• mathematical
• developer
• data talented
• problem solver
• insight whisperer
• product savvy
Source: FICO Infographic
data + data scientist
$$ data product $$
A Brief History of Data Science
BC - The Greeks
1974 Peter Naur @ UoC
2001 William S. Cleveland @ CSU
2003 Journal of Data Science
2009 Jeff Hammerbacher @Facebook
2010 Hillary Mason & Chris Wiggins @ Dataists
2010 Mike Loukadis @ O'Reilly
2011 DJ Patil @ LinkedIn
Famous Definitions – New Blend
Conway’s “Data Science” Venn Diagram (2010)
Image credit: http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
new skill blend:
one stop rock star
Famous Definitions – Skeptic
[… with a great salary]
Famous Definitions – Comparison
Many Flavors of Data Scientist
Alternatively, Data Roles × Skill Sets
Harlan Harris, et al.
datacommunitydc.org/ blog/ wp- content/ uploads/
Analyzing the Analyze
Harlan Harris, S
Marck Vaisman
O’Reilly, 2013
amazon.com/ dp
… from research
to development
to business-focused
Source / Image Credit: H. Harris, S. Murphy, M. Vaisman. “Analyzing the Analyzers.” O’Reilly Media, Jun 2013.
role
skill
2012-3 Survey
Universal Agreement: Scarcity
In 2018
Huge shortage of analytic
talent (140K+).
Gap of 1.5M managers that
can make decisions based on
data analysis
McKinsey Prediction
• Talent is the biggest resource
• There is a raging talent war
Source: J. Manyika et al., “Big data: The next frontier for innovation, competition, and productivity.” McKinsey Global Institute (2011).
http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation
The Data Scientist’s Craft
• Discover unknown unknowns in data
• Obtain predictive, actionable insight
• Communicate business data stories
• Build business decision confidence
• Create valuable Data Products
Valuable & Reusable Data Products
Image credit: Harlan Harris
Building Data Products
Objectives
Levers
Data
Models
What outcome am I trying to achieve?
What inputs can we control?
What data can we collect?
How do the levers impact the data?
Source / Adapted From: J. Howard,. “Designing Great Data Products.” O’Reilly Media, Mar 2012.
Data Product Aims
provide
increase
open
new
improve
data
Some Data Products
fitbit
flu tracker
amazon
traffic ads
SIRI
How Do Data Scientists Do It?
• Tools
• Workflow
• Creativity
Data Science Tools
• Java, R, Python
• Hadoop, HDFS, MapReduce, Spark, Storm
• HBase, Pig, Hive, Shark, Impala
• ETL, Webscrapers, Flume, Sqoop
• SQL, RDBMS, DW, OLAP
• Weka, RapidMiner, numpy, scipy, pandas
• D3.js, ggplot2, Wakari, Tableau, Flare, Shiny
• SPSS, Matlab, SAS
• NoSQL, MongoDB, Redis, ..
• MS-Excel
• Machine Learning
• ...
Data Science Workflow
Source: Josh Wills, Senior Director of Data Science, Cloudera. “From the Lab to
the Factory: Building a Production Machine Learning Infrastructure.”
+ creative exploration
Data Science Creativity
TECHNOLOGY
(feasibility)
BUSINESS
(viability)
HUMAN VALUES
(usability, desirability)
1. Design thinking
2. Scientific method
3. Lots of ideas
4. Inspiration
5. Perspiration
Challenges for Data Scientists
• Stakeholder naivetee
– 2-3 days, right?
• Red tape
– No access allowed
• Terminology
– What’s a wonkulator?
• Real world data
– Messy, noisy, missing,
…
• Unknown need
– What’s the business goal?
• Stakeholder alignment
– CMO, CIO, Prod, DevOps
• Analysis distrust
– … but I don’t like that result
Some Practical Tips
Rapid Iteration
Implement Implement
Feedback
Visualize, Draw, Sketch, Share
Start Simple, Start Small Goal, But Not Perfection
Big Data Science & Sensemaking
Source: HP “Monetizing Big Data” Perspective.
A Final Word of Caution
big data
hypehope happy
time
expectations
cloud computing
2013 2018-2023
Adapted from: Gartner’s 2013 Hype Cycle Special Report (Jul 2013).
Notable Quotes
Simple models and a lot of data trump more elaborate
models based on less data
- Peter Norvig
- W.E. Deming
In God we trust, all others bring data.
- Harvard Prof. Gary King
Big data is not about the data! The value in big data
[is in] the analytics.
Conclusion
• Data is an asset, talent is
a more valuable asset.
• Big data represents a
disruptive shift.
• Data science is the magic
enabler via Data Products.
• Better + faster
explorations &
questions win.
Andrew B. Gardner, PhD
http://linkd.in/1byADxC
agardner@momentics.com
www.momentics.com

More Related Content

What's hot (20)

PDF
Introduction to Data Science (Data Summit, 2017)
Caserta
 
PPTX
Big data
Ami Redwan Haq
 
PDF
Introduction to data science
Tharushi Ruwandika
 
PPTX
Bias in Artificial Intelligence
Neelima Kumar
 
PPTX
Introduction to Big Data/Machine Learning
Lars Marius Garshol
 
PDF
Data science
Mohamed Loey
 
PDF
Synthetic data generation for machine learning
QuantUniversity
 
PPTX
Data Visualization.pptx
Ultimate Multimedia Consult
 
PPTX
The Analytics and Data Science Landscape
Philip Bourne
 
PDF
Building End-to-End Delta Pipelines on GCP
Databricks
 
PDF
Data-Driven Digital Transformation
Jordan Open Source Association
 
PDF
Predictive Analytics - Big Data & Artificial Intelligence
Manish Jain
 
PPTX
Artificial Intelligence In Medical Industry
DataMites
 
PDF
The Advantages and Disadvantages of Big Data
Nicha Tatsaneeyapan
 
PPTX
Data visualization
Subarna Natarajan
 
PDF
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Edureka!
 
PPTX
Data Science
Amit Singh
 
PDF
Exploring Levels of Data Literacy
DATAVERSITY
 
PDF
Deep learning and Healthcare
Thomas da Silva Paula
 
PDF
Introduction to Data Science
ANOOP V S
 
Introduction to Data Science (Data Summit, 2017)
Caserta
 
Big data
Ami Redwan Haq
 
Introduction to data science
Tharushi Ruwandika
 
Bias in Artificial Intelligence
Neelima Kumar
 
Introduction to Big Data/Machine Learning
Lars Marius Garshol
 
Data science
Mohamed Loey
 
Synthetic data generation for machine learning
QuantUniversity
 
Data Visualization.pptx
Ultimate Multimedia Consult
 
The Analytics and Data Science Landscape
Philip Bourne
 
Building End-to-End Delta Pipelines on GCP
Databricks
 
Data-Driven Digital Transformation
Jordan Open Source Association
 
Predictive Analytics - Big Data & Artificial Intelligence
Manish Jain
 
Artificial Intelligence In Medical Industry
DataMites
 
The Advantages and Disadvantages of Big Data
Nicha Tatsaneeyapan
 
Data visualization
Subarna Natarajan
 
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Edureka!
 
Data Science
Amit Singh
 
Exploring Levels of Data Literacy
DATAVERSITY
 
Deep learning and Healthcare
Thomas da Silva Paula
 
Introduction to Data Science
ANOOP V S
 

Viewers also liked (8)

PDF
Bias-variance decomposition in Random Forests
Gilles Louppe
 
PDF
EVOLVE'13 | Keynote | Roy Fielding
Evolve The Adobe Digital Marketing Community
 
PDF
How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...
Lightbend
 
PDF
Impact of big data on analytics
Capgemini
 
PDF
分析せよ!と言われて困っているあなたへの処方箋
The Japan DataScientist Society
 
PDF
データサイエンスの全体像とデータサイエンティスト
The Japan DataScientist Society
 
PDF
データサイエンスの全体像
The Japan DataScientist Society
 
PDF
Deep Learningと画像認識   ~歴史・理論・実践~
nlab_utokyo
 
Bias-variance decomposition in Random Forests
Gilles Louppe
 
EVOLVE'13 | Keynote | Roy Fielding
Evolve The Adobe Digital Marketing Community
 
How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...
Lightbend
 
Impact of big data on analytics
Capgemini
 
分析せよ!と言われて困っているあなたへの処方箋
The Japan DataScientist Society
 
データサイエンスの全体像とデータサイエンティスト
The Japan DataScientist Society
 
データサイエンスの全体像
The Japan DataScientist Society
 
Deep Learningと画像認識   ~歴史・理論・実践~
nlab_utokyo
 
Ad

Similar to Big Data and the Art of Data Science (20)

PPTX
Introduction to Big Data and Data Science
Feyzi R. Bagirov
 
PPTX
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
Vivian S. Zhang
 
PDF
00-01 DSnDA.pdf
SugumarSarDurai
 
PDF
Data science
shankar_radhakrishnan
 
PPTX
Machine Learning For Career Growth..pptx
benishzehra469
 
PDF
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Dr. Sunil Kr. Pandey
 
PDF
Getting started in Data Science (April 2017, Los Angeles)
Thinkful
 
PDF
iTrain Malaysia: Data Science by Tarun Sukhani
iTrain
 
PPTX
Göteborg university(condensed)
Zenodia Charpy
 
PDF
Career in Data Science (July 2017, DTLA)
Thinkful
 
PDF
Data Science: lesson01_intro-to-ds-and-ml.pdf
alhashediyemen
 
PPTX
Data science and visualization power point
vinuthak18
 
PPT
data science ppt of emngineering studnets
anughasha
 
PPT
From Developer to Data Scientist
Gaines Kergosien
 
PPTX
Chapter 1 Introduction to Data Science (Computing)
jayashirymorgan
 
PDF
Data science fin_tech_2016
iECARUS
 
PDF
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
Mihai Criveti
 
PDF
Getting Started in Data Science
Thinkful
 
PPT
Data Science-1 (1).ppt
SanjayAcharaya
 
PDF
Intro to Data Science
TJ Stalcup
 
Introduction to Big Data and Data Science
Feyzi R. Bagirov
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
Vivian S. Zhang
 
00-01 DSnDA.pdf
SugumarSarDurai
 
Data science
shankar_radhakrishnan
 
Machine Learning For Career Growth..pptx
benishzehra469
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Dr. Sunil Kr. Pandey
 
Getting started in Data Science (April 2017, Los Angeles)
Thinkful
 
iTrain Malaysia: Data Science by Tarun Sukhani
iTrain
 
Göteborg university(condensed)
Zenodia Charpy
 
Career in Data Science (July 2017, DTLA)
Thinkful
 
Data Science: lesson01_intro-to-ds-and-ml.pdf
alhashediyemen
 
Data science and visualization power point
vinuthak18
 
data science ppt of emngineering studnets
anughasha
 
From Developer to Data Scientist
Gaines Kergosien
 
Chapter 1 Introduction to Data Science (Computing)
jayashirymorgan
 
Data science fin_tech_2016
iECARUS
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
Mihai Criveti
 
Getting Started in Data Science
Thinkful
 
Data Science-1 (1).ppt
SanjayAcharaya
 
Intro to Data Science
TJ Stalcup
 
Ad

Recently uploaded (20)

PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
July Patch Tuesday
Ivanti
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
July Patch Tuesday
Ivanti
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 

Big Data and the Art of Data Science

  • 1. Big Data and the Art of Data Science Andrew B. Gardner, PhD www.linkedin.com/in/andywocky/ [email protected] www.momentics.com
  • 2. Big Data is Not New Big Data Challenge tion e old 8 1880 census – 50M people The First Big Data Solution • Hollerith Tabulating System • Punched cards – 80 variables • Used for 1890 census • 6 weeks instead of 7+ years 9 Hollerith Tabulation System {age, number of insanes, …} 7 years  6 weeks Image Credit – http://en.wikipedia.org/wiki/File:1880_census_Edison.gif Image Credit – http://en.wikipedia.org/wiki/File:Hollerith_Punched_Card.jpg Image Credit – http://en.wikipedia.org/wiki/File:HollerithMachine.CHM.jpg
  • 3. Big Data Is More Than 3 Vs* Volume Variety Velocity *2001 (Meta) / 2012 (Gartner) Definition of Big Data IDC Report 2011 8 billion TB in 2015 40 billion TB in 2020 90% of all data < 2 years storage  transport processing relational, graph time series, sensor, audio, video, text, geo, scientific, … 80% unstructured facebook 500 TB/day Large Hadron 35 GB/sec twitter 300K tweets/min real time  stream
  • 4. Big Data Opportunities “… big data market will grow from $3.2B (2010) to $16.9B (2015)…” “… gains of 5-6% productivity and profitability …” “… business volume will double every 1.2 years …” “… required for companies to stay innovative and competitive …” “… retail 60% increase in net margin attainable …” “… manufacturing production costs decrease 50% …” “… $300B annual savings in healthcare …” IBM | The Economist | McKinsey & Company | PWC | KPMG | Accenture
  • 5. Big Data Successes Walmart • 10-15% online sales lift • $1B incremental revenue • Recommendations • Engineered content • 2012 Presidential Election • Fleet telematics save fuel
  • 7. 1: Growth of Data Amount of data in the world… 2005 100 EB 2012 2800 EB 2013 8000 EB 1 EB = 1 Exabyte = 1 billion GB … doubles every 2 years
  • 8. 2: Connectedness & Sources More non-human nodes online than people 50B+ non-human nodes online The Internet of Things (IoT) Source: Swan, M. Sensor Mania! The Internet of Things, Objective Metrics, and the Quantified Self 2.0. J Sens Actuator Netw (2012) 1(3), 217-253. social mobile web enriched data science IoT Data Sources
  • 10. 4: Economics Attention economy not information economy! • Data is bountiful • Storage is cheap • Computing is cheap • Analysis is cheap • Talent is expensive • Time is expensive
  • 11. Big Data Disruption • define schema • pour in data • analyze Better Cycle Times and Better Questions Win!  (few) well calculated questions first • collect data • explore • schema as needed  data first then exploratory decision making unknown unknowns = insight gold OLD NEW
  • 12. Rumsfeld Analytics Things we know don’t know we know we don’t know we know we don’t know Facts – could be wrong. Questions – do reporting. Intuition – quantify to improve. Exploration– unfair advantages. Goal: data discoveries = insights = game changers = unknown unknowns.
  • 13. Data Alone is Just An Asset • Depreciating • Liability • Useful lifetime • Expense Finished goods create value from raw materials data $$ data product $$
  • 14. Enter the Data Scientist • mathematical • developer • data talented • problem solver • insight whisperer • product savvy Source: FICO Infographic data + data scientist $$ data product $$
  • 15. A Brief History of Data Science BC - The Greeks 1974 Peter Naur @ UoC 2001 William S. Cleveland @ CSU 2003 Journal of Data Science 2009 Jeff Hammerbacher @Facebook 2010 Hillary Mason & Chris Wiggins @ Dataists 2010 Mike Loukadis @ O'Reilly 2011 DJ Patil @ LinkedIn
  • 16. Famous Definitions – New Blend Conway’s “Data Science” Venn Diagram (2010) Image credit: http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram new skill blend: one stop rock star
  • 17. Famous Definitions – Skeptic [… with a great salary]
  • 19. Many Flavors of Data Scientist Alternatively, Data Roles × Skill Sets Harlan Harris, et al. datacommunitydc.org/ blog/ wp- content/ uploads/ Analyzing the Analyze Harlan Harris, S Marck Vaisman O’Reilly, 2013 amazon.com/ dp … from research to development to business-focused Source / Image Credit: H. Harris, S. Murphy, M. Vaisman. “Analyzing the Analyzers.” O’Reilly Media, Jun 2013. role skill 2012-3 Survey
  • 20. Universal Agreement: Scarcity In 2018 Huge shortage of analytic talent (140K+). Gap of 1.5M managers that can make decisions based on data analysis McKinsey Prediction • Talent is the biggest resource • There is a raging talent war Source: J. Manyika et al., “Big data: The next frontier for innovation, competition, and productivity.” McKinsey Global Institute (2011). http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation
  • 21. The Data Scientist’s Craft • Discover unknown unknowns in data • Obtain predictive, actionable insight • Communicate business data stories • Build business decision confidence • Create valuable Data Products
  • 22. Valuable & Reusable Data Products Image credit: Harlan Harris
  • 23. Building Data Products Objectives Levers Data Models What outcome am I trying to achieve? What inputs can we control? What data can we collect? How do the levers impact the data? Source / Adapted From: J. Howard,. “Designing Great Data Products.” O’Reilly Media, Mar 2012.
  • 25. Some Data Products fitbit flu tracker amazon traffic ads SIRI
  • 26. How Do Data Scientists Do It? • Tools • Workflow • Creativity
  • 27. Data Science Tools • Java, R, Python • Hadoop, HDFS, MapReduce, Spark, Storm • HBase, Pig, Hive, Shark, Impala • ETL, Webscrapers, Flume, Sqoop • SQL, RDBMS, DW, OLAP • Weka, RapidMiner, numpy, scipy, pandas • D3.js, ggplot2, Wakari, Tableau, Flare, Shiny • SPSS, Matlab, SAS • NoSQL, MongoDB, Redis, .. • MS-Excel • Machine Learning • ...
  • 28. Data Science Workflow Source: Josh Wills, Senior Director of Data Science, Cloudera. “From the Lab to the Factory: Building a Production Machine Learning Infrastructure.” + creative exploration
  • 29. Data Science Creativity TECHNOLOGY (feasibility) BUSINESS (viability) HUMAN VALUES (usability, desirability) 1. Design thinking 2. Scientific method 3. Lots of ideas 4. Inspiration 5. Perspiration
  • 30. Challenges for Data Scientists • Stakeholder naivetee – 2-3 days, right? • Red tape – No access allowed • Terminology – What’s a wonkulator? • Real world data – Messy, noisy, missing, … • Unknown need – What’s the business goal? • Stakeholder alignment – CMO, CIO, Prod, DevOps • Analysis distrust – … but I don’t like that result
  • 31. Some Practical Tips Rapid Iteration Implement Implement Feedback Visualize, Draw, Sketch, Share Start Simple, Start Small Goal, But Not Perfection
  • 32. Big Data Science & Sensemaking Source: HP “Monetizing Big Data” Perspective.
  • 33. A Final Word of Caution big data hypehope happy time expectations cloud computing 2013 2018-2023 Adapted from: Gartner’s 2013 Hype Cycle Special Report (Jul 2013).
  • 34. Notable Quotes Simple models and a lot of data trump more elaborate models based on less data - Peter Norvig - W.E. Deming In God we trust, all others bring data. - Harvard Prof. Gary King Big data is not about the data! The value in big data [is in] the analytics.
  • 35. Conclusion • Data is an asset, talent is a more valuable asset. • Big data represents a disruptive shift. • Data science is the magic enabler via Data Products. • Better + faster explorations & questions win. Andrew B. Gardner, PhD http://linkd.in/1byADxC [email protected] www.momentics.com

Editor's Notes

  • #3: Herman HollerithObsolete1880 – 50,189,2091890 – 62,947,714
  • #4: ~ 15 mins via 10Gbps LAN to transfer 1TB~ 220 hrs for 1 PB =&gt; move the servers?
  • #23: Harlan Harris
  • #34: Data is the new currency of business.Understand customer use, behavior, and interests. Targeted products and marketing offers Understand customer experience across network, services, and social conversation.Network optimization Connect with OTT players, advertisers, and verticals. New business models