SlideShare a Scribd company logo
TITLE and title
BIG DATA SCIENCE
Chandan Rajah – CEO, Parallel AI
“The price of light is far less than the cost of darkness”
TITLE and title
SUB TITLE SUB TITLE
footnote footnote
BENEFITS OF BIG DATA
COST SPEED
AGILITY CAPABILITY
TITLE and title
SUB TITLE SUB TITLE
footnote footnote
BIG DATA JOURNEY
WHERE
WHAT WHY
HOW
TITLE and title
SUB TITLE SUB TITLE
footnote footnote
What is Big Data ?
Big Data ≠ Data Volume
Big Data = Crude Oil
Think of data like ‘Crude Oil’
Big Data is about extracting ‘crude oil’; transporting it in ‘pipelines’; storing it in ‘mega tanks’
Source: Data Science London
TITLE and title
SUB TITLE SUB TITLE
footnote footnote
What is Data Science ?
Data Science ≠ Statistical Analysis
Data Science = Oil Refinery
Data science is about ‘treating’ data; applying ‘science’ to the data;
Refine the data ‘results’; and combine to form ‘insight’
Source: Data Science London
TITLE and title
SUB TITLE SUB TITLE
footnote footnote
What is the Big Data Science Toolkit ?
• Scala, Java, Python, R… (bonus: Clojure Haskell, Erlang)
• Hadoop, HDFS, MapReduce… (bonus: Spark, Storm, Tez)
• Scalding, HBase, Hive… (bonus: Shark, Titan, Giraph)
• Flume, Sqoop, ETL, Webscrapers… (bonus: Hume)
• SQL, RDBMS, DW, OLAP… (bonus: SOLR, ElasticSearch)
• Knime, Weka RapidMiner… (bonus: SciPy, NumPy, Pandas)
• D3.js, Kibana, ggplot2, Flare… (bonus: Shiny, Flare, Datameer)
• NoSQL, MongoDB, Cassandra, CouchDB
• And sometimes… MS Excel
Source: Data Science London
TITLE and title
SUB TITLE SUB TITLE
footnote footnote
Knowns, Unknowns & DIKUW FTW!
known knowns
we know we know
known unknowns
we know we don’t know
unknown unknowns
we don’t know we don’t know
D I K U W
DATA INFORMATION KNOWLEDGE UNDERSTANDING WISDOM
raw what how to why when
numbers description experience cause & effect prediction
letters context tested proven what’s best
symbols relationship instruction
signals reports programs models
PAST FUTURE
Data Engineer Data Analyst Data Miner Data Scientist
known knowns
known unknowns unknown unknowns
Source: Data Science London
TITLE
TITLE TITLE
TITLE
Business Intelligence to Data Discovery ?
data you know
data you don’t know
questionsyou’reasking
questionsyou’renotasking
Data Analyst
Data Scientist
Business
Intelligence
Data Discovery
DATA MODELLING
Y  F( X, random noise, parameters)
ALGORITHMIC MODELLING
Y  [ BLACK BOX ]  X
Source: Applied Data Labs & Leo Breiman
TITLE and title
SUB TITLE SUB TITLE
footnote footnote
BIG DATA JOURNEY
WHERE
WHAT WHY
HOW
TITLE
TITLE TITLE
TITLE
Why is Big Data needed ?
VOLUME VELOCITY VARIETY
Exponential growth; 2x in 2 yrs
PB (1000 TB) is now common
Event streams; never at rest
640k GB per internet minute
100s of data sources
85% not in a table
TITLE and title
SUB TITLE SUB TITLE
footnote footnote
BIG DATA JOURNEY
WHERE
WHAT WHY
HOW
TITLE
TITLE TITLE
TITLE
Big Data Heat Map – Gartner 2012
TITLE
TITLE TITLE
TITLE
Big Data Potential by Sector – McKinsey for USBLS, 2011
TITLE
TITLE TITLE
TITLE
Big Data Investment by Industry – Gartner, 2012
TITLE
TITLE TITLE
TITLE
Top Big Data Challenges – Gartner, 2012
TITLE
TITLE TITLE
TITLE
CIO Survey on Big Data Investments – IDG Survey, 2013
TITLE
TITLE TITLE
TITLE
CIO Survey on Main Drivers to Invest – IDG Survey, 2014
TITLE and title
SUB TITLE SUB TITLE
footnote footnote
BIG DATA JOURNEY
WHERE
WHAT WHY
HOW
TITLE
TITLE TITLE
TITLE
How will Big Data Evolve?
EXTERNAL ALIGNMENT INTERNAL COHERENCE
Align with Existing BI; Maximise Value
Exploit Capability; Respond Rapidly
Focus; Innovate; Stay Ahead
Repeat; Stabilize; Governance
TITLE and title
SUB TITLE SUB TITLE
footnote footnote
RECAP OF BENEFITS
COST SPEED
AGILITY CAPABILITY
TITLE
TITLE TITLE
TITLE
LAST WORDS OF WISDOM
NOT ALL ROADS LEAD TO ROME
TIME VALUE OF DATA KNOWLEDGE IS POWER
I AM AN INDIVIDUAL
TITLE and title
“The price of light is far less than the cost of darkness”

More Related Content

PDF
Data Strategy
sabnees
 
PPTX
Big Data Analytics
Ghulam Imaduddin
 
PPTX
Introduction to Web Mining and Spatial Data Mining
AarshDhokai
 
PDF
Becoming a Data-Driven Organization - Aligning Business & Data Strategy
DATAVERSITY
 
PDF
Do-It-Yourself (DIY) Data Governance Framework
DATAVERSITY
 
PDF
Key Elements of a Successful Data Governance Program
DATAVERSITY
 
PPT
Data PreProcessing
tdharmaputhiran
 
PDF
project sentiment analysis
sneha penmetsa
 
Data Strategy
sabnees
 
Big Data Analytics
Ghulam Imaduddin
 
Introduction to Web Mining and Spatial Data Mining
AarshDhokai
 
Becoming a Data-Driven Organization - Aligning Business & Data Strategy
DATAVERSITY
 
Do-It-Yourself (DIY) Data Governance Framework
DATAVERSITY
 
Key Elements of a Successful Data Governance Program
DATAVERSITY
 
Data PreProcessing
tdharmaputhiran
 
project sentiment analysis
sneha penmetsa
 

What's hot (20)

PPTX
ACID (ATOMICITY, CONSISTENCY,ISOLATION& DURABILITY)PROPERTY.pptx
revshru
 
PDF
Big Data: Its Characteristics And Architecture Capabilities
Ashraf Uddin
 
PDF
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
DATAVERSITY
 
PPTX
Big data by Mithlesh sadh
Mithlesh Sadh
 
PDF
Introduction to Data Science
Niko Vuokko
 
PPT
Big data ppt
IDBI Bank Ltd.
 
PPTX
Data Vault Vs Data Lake
Calum Miller
 
PPT
DATA WAREHOUSING AND DATA MINING
Lovely Professional University
 
PDF
The Evolving Role of the Data Architect – What Does It Mean for Your Career?
DATAVERSITY
 
KEY
Intro to Data Science for Enterprise Big Data
Paco Nathan
 
PPTX
Data science & data scientist
VijayMohan Vasu
 
PDF
Data-Ed Slides: Best Practices in Data Stewardship (Technical)
DATAVERSITY
 
PDF
Accenture-Cloud-Data-Migration-POV-Final.pdf
Rajvir Kaushal
 
PDF
Data Preparation Fundamentals
DATAVERSITY
 
PPTX
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Simplilearn
 
PDF
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Edureka!
 
PPTX
Data cleansing
kunaljain1701
 
PPTX
Data analytics
Dr.Bhuvaneswari Velumani
 
PDF
Data Visualization(s) Using Python
Aniket Maithani
 
PDF
Data Management vs. Data Governance Program
DATAVERSITY
 
ACID (ATOMICITY, CONSISTENCY,ISOLATION& DURABILITY)PROPERTY.pptx
revshru
 
Big Data: Its Characteristics And Architecture Capabilities
Ashraf Uddin
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
DATAVERSITY
 
Big data by Mithlesh sadh
Mithlesh Sadh
 
Introduction to Data Science
Niko Vuokko
 
Big data ppt
IDBI Bank Ltd.
 
Data Vault Vs Data Lake
Calum Miller
 
DATA WAREHOUSING AND DATA MINING
Lovely Professional University
 
The Evolving Role of the Data Architect – What Does It Mean for Your Career?
DATAVERSITY
 
Intro to Data Science for Enterprise Big Data
Paco Nathan
 
Data science & data scientist
VijayMohan Vasu
 
Data-Ed Slides: Best Practices in Data Stewardship (Technical)
DATAVERSITY
 
Accenture-Cloud-Data-Migration-POV-Final.pdf
Rajvir Kaushal
 
Data Preparation Fundamentals
DATAVERSITY
 
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Simplilearn
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Edureka!
 
Data cleansing
kunaljain1701
 
Data analytics
Dr.Bhuvaneswari Velumani
 
Data Visualization(s) Using Python
Aniket Maithani
 
Data Management vs. Data Governance Program
DATAVERSITY
 
Ad

Viewers also liked (20)

PDF
Introduction to data science intro,ch(1,2,3)
heba_ahmad
 
PPTX
Introduction to Data Science
Caserta
 
PDF
Demystifying Data Science with an introduction to Machine Learning
Julian Bright
 
PPTX
Intro to Data Science Concepts
University of Washington
 
PPTX
Essential Tools For Your Big Data Arsenal
MongoDB
 
PDF
TOUG Big Data Challenge and Impact
Toronto-Oracle-Users-Group
 
PPSX
Intro to Data Science Big Data
Indu Khemchandani
 
PDF
Big Data Hoopla Simplified - TDWI Memphis 2014
Rajan Kanitkar
 
PDF
H2O World - Intro to Data Science with Erin Ledell
Sri Ambati
 
PPTX
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014
Austin Ogilvie
 
PDF
Introduction to Data Science with Hadoop
Dr. Volkan OBAN
 
PPTX
Introduction to data science and candidate data science projects
Jay (Jianqiang) Wang
 
PPTX
Introduction to Data Science: A Practical Approach to Big Data Analytics
Ivan Khvostishkov
 
PPTX
Introduction of Data Science
Jason Geng
 
PDF
Data Science
Prithwis Mukerjee
 
PPTX
Introduction to data science
Vignesh Prajapati
 
PDF
Introduction to Data Science with H2O- Mountain View
Sri Ambati
 
PDF
Intro to Data Science for Non-Data Scientists
Sri Ambati
 
PDF
QlikView & Big Data
Mischa van Werkhoven
 
PPTX
Introduction to (Big) Data Science
InfoFarm
 
Introduction to data science intro,ch(1,2,3)
heba_ahmad
 
Introduction to Data Science
Caserta
 
Demystifying Data Science with an introduction to Machine Learning
Julian Bright
 
Intro to Data Science Concepts
University of Washington
 
Essential Tools For Your Big Data Arsenal
MongoDB
 
TOUG Big Data Challenge and Impact
Toronto-Oracle-Users-Group
 
Intro to Data Science Big Data
Indu Khemchandani
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Rajan Kanitkar
 
H2O World - Intro to Data Science with Erin Ledell
Sri Ambati
 
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014
Austin Ogilvie
 
Introduction to Data Science with Hadoop
Dr. Volkan OBAN
 
Introduction to data science and candidate data science projects
Jay (Jianqiang) Wang
 
Introduction to Data Science: A Practical Approach to Big Data Analytics
Ivan Khvostishkov
 
Introduction of Data Science
Jason Geng
 
Data Science
Prithwis Mukerjee
 
Introduction to data science
Vignesh Prajapati
 
Introduction to Data Science with H2O- Mountain View
Sri Ambati
 
Intro to Data Science for Non-Data Scientists
Sri Ambati
 
QlikView & Big Data
Mischa van Werkhoven
 
Introduction to (Big) Data Science
InfoFarm
 
Ad

Similar to Big Data Science: Intro and Benefits (20)

PPTX
Big Data Science at the Digital Catapult
Chandan Rajah
 
PPTX
Steps to the Big Data Science Epiphany
Chandan Rajah
 
PDF
00-01 DSnDA.pdf
SugumarSarDurai
 
PDF
365 Data Science
IvanHo572682
 
PDF
Big Data & Social Analytics presentation
gustavosouto
 
PDF
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Dr. Sunil Kr. Pandey
 
PPTX
Data science Big Data
sreekanthricky
 
PPTX
Big dataorig
Vikas Thada
 
PPTX
Chapter 1 Introduction to Data Science (Computing)
jayashirymorgan
 
PDF
#BigDataCanarias: "Big Data & Career Paths"
Marcos Colebrook-Santamaria
 
PPTX
Big Data and Data Science: The Technologies Shaping Our Lives
Rukshan Batuwita
 
PPTX
Fundamentals of Big Data
The Wisdom Daily
 
PDF
Data Science: lesson01_intro-to-ds-and-ml.pdf
alhashediyemen
 
PDF
Sql saturday el salvador 2016 - Me, A Data Scientist?
Fabricio Quintanilla
 
PPTX
Big Data By Vijay Bhaskar Semwal
IIIT Allahabad
 
PPTX
Big data road map
karthika karthi
 
PDF
Introduction to Data Science
Edureka!
 
PPTX
Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...
Kevin Pledge
 
PPTX
Analytics for actuaries cia
Kevin Pledge
 
PPTX
Big Data and the Art of Data Science
Andrew Gardner
 
Big Data Science at the Digital Catapult
Chandan Rajah
 
Steps to the Big Data Science Epiphany
Chandan Rajah
 
00-01 DSnDA.pdf
SugumarSarDurai
 
365 Data Science
IvanHo572682
 
Big Data & Social Analytics presentation
gustavosouto
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Dr. Sunil Kr. Pandey
 
Data science Big Data
sreekanthricky
 
Big dataorig
Vikas Thada
 
Chapter 1 Introduction to Data Science (Computing)
jayashirymorgan
 
#BigDataCanarias: "Big Data & Career Paths"
Marcos Colebrook-Santamaria
 
Big Data and Data Science: The Technologies Shaping Our Lives
Rukshan Batuwita
 
Fundamentals of Big Data
The Wisdom Daily
 
Data Science: lesson01_intro-to-ds-and-ml.pdf
alhashediyemen
 
Sql saturday el salvador 2016 - Me, A Data Scientist?
Fabricio Quintanilla
 
Big Data By Vijay Bhaskar Semwal
IIIT Allahabad
 
Big data road map
karthika karthi
 
Introduction to Data Science
Edureka!
 
Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...
Kevin Pledge
 
Analytics for actuaries cia
Kevin Pledge
 
Big Data and the Art of Data Science
Andrew Gardner
 

More from Chandan Rajah (17)

PPT
Business Change through Predictive Analytics
Chandan Rajah
 
PPT
Business Change through Predictive Analytics
Chandan Rajah
 
PPTX
Data Disruption by Vertical Innovation
Chandan Rajah
 
PPTX
Data Innovation in the UK
Chandan Rajah
 
PPTX
Data Disruption by Vertical Innovation in Media
Chandan Rajah
 
PDF
Catalysing Sector Advantage
Chandan Rajah
 
DOCX
Rise of the Machines
Chandan Rajah
 
PPTX
Health Innovation and the Digital Catapult
Chandan Rajah
 
PPTX
Connected Farms ...and the Digital Catapult
Chandan Rajah
 
PPTX
Data Innovation in the Digital Economy
Chandan Rajah
 
PPTX
Disruptive Data in Future Care
Chandan Rajah
 
PPTX
Data Warehouse to Data Science
Chandan Rajah
 
PPTX
Business Impact of Predictive Analytics
Chandan Rajah
 
PPTX
Social Triangulation with Big Data
Chandan Rajah
 
PPTX
Big Data Science Challenges in Media
Chandan Rajah
 
PPTX
Hadoop and friends
Chandan Rajah
 
PPT
IPTV Case Study
Chandan Rajah
 
Business Change through Predictive Analytics
Chandan Rajah
 
Business Change through Predictive Analytics
Chandan Rajah
 
Data Disruption by Vertical Innovation
Chandan Rajah
 
Data Innovation in the UK
Chandan Rajah
 
Data Disruption by Vertical Innovation in Media
Chandan Rajah
 
Catalysing Sector Advantage
Chandan Rajah
 
Rise of the Machines
Chandan Rajah
 
Health Innovation and the Digital Catapult
Chandan Rajah
 
Connected Farms ...and the Digital Catapult
Chandan Rajah
 
Data Innovation in the Digital Economy
Chandan Rajah
 
Disruptive Data in Future Care
Chandan Rajah
 
Data Warehouse to Data Science
Chandan Rajah
 
Business Impact of Predictive Analytics
Chandan Rajah
 
Social Triangulation with Big Data
Chandan Rajah
 
Big Data Science Challenges in Media
Chandan Rajah
 
Hadoop and friends
Chandan Rajah
 
IPTV Case Study
Chandan Rajah
 

Big Data Science: Intro and Benefits

  • 1. TITLE and title BIG DATA SCIENCE Chandan Rajah – CEO, Parallel AI “The price of light is far less than the cost of darkness”
  • 2. TITLE and title SUB TITLE SUB TITLE footnote footnote BENEFITS OF BIG DATA COST SPEED AGILITY CAPABILITY
  • 3. TITLE and title SUB TITLE SUB TITLE footnote footnote BIG DATA JOURNEY WHERE WHAT WHY HOW
  • 4. TITLE and title SUB TITLE SUB TITLE footnote footnote What is Big Data ? Big Data ≠ Data Volume Big Data = Crude Oil Think of data like ‘Crude Oil’ Big Data is about extracting ‘crude oil’; transporting it in ‘pipelines’; storing it in ‘mega tanks’ Source: Data Science London
  • 5. TITLE and title SUB TITLE SUB TITLE footnote footnote What is Data Science ? Data Science ≠ Statistical Analysis Data Science = Oil Refinery Data science is about ‘treating’ data; applying ‘science’ to the data; Refine the data ‘results’; and combine to form ‘insight’ Source: Data Science London
  • 6. TITLE and title SUB TITLE SUB TITLE footnote footnote What is the Big Data Science Toolkit ? • Scala, Java, Python, R… (bonus: Clojure Haskell, Erlang) • Hadoop, HDFS, MapReduce… (bonus: Spark, Storm, Tez) • Scalding, HBase, Hive… (bonus: Shark, Titan, Giraph) • Flume, Sqoop, ETL, Webscrapers… (bonus: Hume) • SQL, RDBMS, DW, OLAP… (bonus: SOLR, ElasticSearch) • Knime, Weka RapidMiner… (bonus: SciPy, NumPy, Pandas) • D3.js, Kibana, ggplot2, Flare… (bonus: Shiny, Flare, Datameer) • NoSQL, MongoDB, Cassandra, CouchDB • And sometimes… MS Excel Source: Data Science London
  • 7. TITLE and title SUB TITLE SUB TITLE footnote footnote Knowns, Unknowns & DIKUW FTW! known knowns we know we know known unknowns we know we don’t know unknown unknowns we don’t know we don’t know D I K U W DATA INFORMATION KNOWLEDGE UNDERSTANDING WISDOM raw what how to why when numbers description experience cause & effect prediction letters context tested proven what’s best symbols relationship instruction signals reports programs models PAST FUTURE Data Engineer Data Analyst Data Miner Data Scientist known knowns known unknowns unknown unknowns Source: Data Science London
  • 8. TITLE TITLE TITLE TITLE Business Intelligence to Data Discovery ? data you know data you don’t know questionsyou’reasking questionsyou’renotasking Data Analyst Data Scientist Business Intelligence Data Discovery DATA MODELLING Y  F( X, random noise, parameters) ALGORITHMIC MODELLING Y  [ BLACK BOX ]  X Source: Applied Data Labs & Leo Breiman
  • 9. TITLE and title SUB TITLE SUB TITLE footnote footnote BIG DATA JOURNEY WHERE WHAT WHY HOW
  • 10. TITLE TITLE TITLE TITLE Why is Big Data needed ? VOLUME VELOCITY VARIETY Exponential growth; 2x in 2 yrs PB (1000 TB) is now common Event streams; never at rest 640k GB per internet minute 100s of data sources 85% not in a table
  • 11. TITLE and title SUB TITLE SUB TITLE footnote footnote BIG DATA JOURNEY WHERE WHAT WHY HOW
  • 12. TITLE TITLE TITLE TITLE Big Data Heat Map – Gartner 2012
  • 13. TITLE TITLE TITLE TITLE Big Data Potential by Sector – McKinsey for USBLS, 2011
  • 14. TITLE TITLE TITLE TITLE Big Data Investment by Industry – Gartner, 2012
  • 15. TITLE TITLE TITLE TITLE Top Big Data Challenges – Gartner, 2012
  • 16. TITLE TITLE TITLE TITLE CIO Survey on Big Data Investments – IDG Survey, 2013
  • 17. TITLE TITLE TITLE TITLE CIO Survey on Main Drivers to Invest – IDG Survey, 2014
  • 18. TITLE and title SUB TITLE SUB TITLE footnote footnote BIG DATA JOURNEY WHERE WHAT WHY HOW
  • 19. TITLE TITLE TITLE TITLE How will Big Data Evolve? EXTERNAL ALIGNMENT INTERNAL COHERENCE Align with Existing BI; Maximise Value Exploit Capability; Respond Rapidly Focus; Innovate; Stay Ahead Repeat; Stabilize; Governance
  • 20. TITLE and title SUB TITLE SUB TITLE footnote footnote RECAP OF BENEFITS COST SPEED AGILITY CAPABILITY
  • 21. TITLE TITLE TITLE TITLE LAST WORDS OF WISDOM NOT ALL ROADS LEAD TO ROME TIME VALUE OF DATA KNOWLEDGE IS POWER I AM AN INDIVIDUAL
  • 22. TITLE and title “The price of light is far less than the cost of darkness”

Editor's Notes

  • #3: COST – 20x less per TB v/s Teradata, Netezza, Oracle– 75% less average marginal cost per capacitySPEED – 10x faster than Teradata, NetezzaAGILITY – 115% lesser average cost per data source v/s OracleSCIENCE – Machine learning, prediction
  • #4: WHAT - What is Big Data Science?WHY - Why is it needed?WHERE - Where is it being used?HOW - How will it evolve?
  • #21: COST – 20x less per TB v/s Teradata, Netezza, Oracle– 75% less average marginal cost per capacitySPEED – 10x faster than Teradata, NetezzaAGILITY – 115% lesser average cost per data source v/s OracleSCIENCE – Machine learning, prediction
  • #22: TIME VALUE - Yesterday’s data is less valuable than today’s data - Historical data is more valuable than just now alonePOWER - Get from unknown unknowns to known unknowns or known knowns is powerfulLEAD TO ROME - Exploring with no direct business impact is not a bad thingINDIVUDUAL - Treat every customer as an individual not an aggregate and analyse - Aggregate only individual insights