BIG DATA SCIENCE
“The price of light is far less than the cost of darkness”
Chandan Rajah [ @ChandanRajah ]
BENEFITS OF BIG DATA
COST SPEED
AGILITY CAPABILITY
Steps to the EPIPHANY
WHERE
WHAT WHY
DEMO
What is Big Data ?
Big Data ≠ Data Volume
Big Data = Crude Oil
Think of data like ‘Crude Oil’
Big Data is about extracting ‘crude oil’; transporting it in ‘pipelines’; storing it in ‘mega tanks’
What is Data Science ?
Data Science ≠ Statistical Analysis
Data Science = Oil Refinery
Data science is about ‘treating’ data; applying ‘science’ to the data;
Refine the data ‘results’; and combine to form ‘insight’
Knowns, Unknowns & DIKUW FTW!
known knowns
we know we know
known unknowns
we know we don’t know
unknown unknowns
we don’t know we don’t know
D
DATA
I
INFORMATION
K
KNOWLEDGE
W
WISDOM
U
UNDERSTANDING
raw what how to why when
numbers description experience cause & effect prediction
letters context tested proven what’s best
symbols relationship instruction
signals reports programs models
PAST FUTURE
Data Engineer Data Analyst Data Miner Data Scientist
known knowns
known unknowns unknown unknowns
Data Analytics to Data Discovery ?
data you know
data you don’t know
questionsyou’reasking
questionsyou’renotasking
Data Analyst
Data Scientist
Data
Analytics
Data Discovery
DATA MODELLING
Y  F( X, random noise, parameters)
ALGORITHMIC MODELLING
Y  [ BLACK BOX ]  X
DIVIDE
SCATTER
Split Data in Block
Replicate and Store
Petabytes of Resilience
CONQUER
EXPLORE
1000s of Parallel Threads
Explore Every Path
Machine Learning
INSIGHT
GATHER
Real Time Action
Periodic Dashboards
Iterative Evolution
What is the Big Idea ?
Divide = HDFS
Name Node
1 32
Client 1. Create Metadata
2. Put Blocks
Data Nodes
Control / Monitoring
1 1
2 2
3 3
WRITE
Name Node
1 1 1 2
2
2
3 3 34
4 4
Client 1. Get Metadata
2. Fetch Blocks
Data Nodes
Control / Monitoring
READ
Conquer = MapReduce
Insight = Functional Paradigm
Steps to the EPIPHANY
WHERE
WHAT WHY
DEMO
Why is Big Data needed ?
VOLUME VELOCITY VARIETY
Exponential growth; 2x in 2 yrs
PB (1000 TB) is now common
Event streams; never at rest
640k GB per internet minute
100s of data sources
85% not in a table
Where in the Value Chain ?
Generation Transport Knowledge Output Value
BIG DATA SCIENCE
Straddles all four Challenge Areas
Steps to the EPIPHANY
WHERE
WHAT WHY
DEMO
Big Data Heat Map – Gartner 2012
Big Data Potential by Sector – McKinsey for USBLS, 2011
Big Data Investment by Industry – Gartner, 2012
Top Big Data Challenges – Gartner, 2012
Survey on Big Data Investments – IDG Survey, 2013
Survey on Main Drivers to Invest – IDG Survey, 2014
Steps to the EPIPHANY
WHERE
WHAT WHY
DEMO
DEMO
RECAP OF BENEFITS
COST SPEED
AGILITY CAPABILITY
LAST WORDS OF WISDOM
NOT ALL ROADS LEAD TO ROME
TIME VALUE OF DATA KNOWLEDGE IS POWER
I AM AN INDIVIDUAL
“The price of light is far less than the cost of darkness”

More Related Content

PPTX
Big Data Science at the Digital Catapult
PPTX
Data science a glance
PDF
Big Data: What's it Really About?
PDF
Business Insight 2014 - Data insights flyer
PDF
Biq query devfest2017_slides
PDF
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
PDF
What is a Data Scientist
PPTX
Big data
Big Data Science at the Digital Catapult
Data science a glance
Big Data: What's it Really About?
Business Insight 2014 - Data insights flyer
Biq query devfest2017_slides
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
What is a Data Scientist
Big data

What's hot (7)

PPTX
Big data and data mining
PPTX
Data Disruption by Vertical Innovation
PDF
RightScale Roadtrip - The Power of a Cloud-Enabled Agency
PDF
Big data in real estate
PPTX
David Waxman Keynote
PDF
Building an Insight Machine - Strata DDBD 2015
PPTX
Sql rally amsterdam Aanalysing data with Power BI and Hive
Big data and data mining
Data Disruption by Vertical Innovation
RightScale Roadtrip - The Power of a Cloud-Enabled Agency
Big data in real estate
David Waxman Keynote
Building an Insight Machine - Strata DDBD 2015
Sql rally amsterdam Aanalysing data with Power BI and Hive
Ad

Viewers also liked (7)

PDF
Data Science
PPTX
Standardizing +113 million Merchant Names in Financial Services with Greenplu...
PDF
How to Become a Data Scientist – By Ryan Orban, VP of Operations and Expansio...
PDF
Myths and Mathemagical Superpowers of Data Scientists
PDF
A Statistician's View on Big Data and Data Science (Version 1)
PDF
How to Become a Data Scientist
PDF
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Data Science
Standardizing +113 million Merchant Names in Financial Services with Greenplu...
How to Become a Data Scientist – By Ryan Orban, VP of Operations and Expansio...
Myths and Mathemagical Superpowers of Data Scientists
A Statistician's View on Big Data and Data Science (Version 1)
How to Become a Data Scientist
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Ad

Similar to Steps to the Big Data Science Epiphany (20)

PPS
Big Data Science: Intro and Benefits
PPTX
Big Data By Vijay Bhaskar Semwal
PDF
All About Big Data
PPT
Lecture 5 - Big Data and Hadoop Intro.ppt
PDF
ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...
PPTX
Big data Intro - Presentation to OCHackerz Meetup Group
PPTX
Introduction to big data
PPT
Understanding big data, a business perspective
PPTX
SKILLWISE-BIGDATA ANALYSIS
PPTX
Introduction to Big Data
PPTX
Big Data and Data Science: The Technologies Shaping Our Lives
KEY
Exploring Big Data value for your business
PDF
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
PDF
00-01 DSnDA.pdf
PPT
Research issues in the big data and its Challenges
PDF
Big Data & Social Analytics presentation
PPTX
Sr. Jon Ander, Internet de las Cosas y Big Data: ¿hacia dónde va la Industria?
PPSX
Big data with Hadoop - Introduction
PPTX
Machine Learning with Hadoop Boston hug 2012
PPTX
20211011112936_PPT01-Introduction to Big Data.pptx
Big Data Science: Intro and Benefits
Big Data By Vijay Bhaskar Semwal
All About Big Data
Lecture 5 - Big Data and Hadoop Intro.ppt
ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...
Big data Intro - Presentation to OCHackerz Meetup Group
Introduction to big data
Understanding big data, a business perspective
SKILLWISE-BIGDATA ANALYSIS
Introduction to Big Data
Big Data and Data Science: The Technologies Shaping Our Lives
Exploring Big Data value for your business
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
00-01 DSnDA.pdf
Research issues in the big data and its Challenges
Big Data & Social Analytics presentation
Sr. Jon Ander, Internet de las Cosas y Big Data: ¿hacia dónde va la Industria?
Big data with Hadoop - Introduction
Machine Learning with Hadoop Boston hug 2012
20211011112936_PPT01-Introduction to Big Data.pptx

More from Chandan Rajah (16)

PPT
Business Change through Predictive Analytics
PPT
Business Change through Predictive Analytics
PPTX
Data Innovation in the UK
PPTX
Data Disruption by Vertical Innovation in Media
PDF
Catalysing Sector Advantage
DOCX
Rise of the Machines
PPTX
Health Innovation and the Digital Catapult
PPTX
Connected Farms ...and the Digital Catapult
PPTX
Data Innovation in the Digital Economy
PPTX
Disruptive Data in Future Care
PPTX
Data Warehouse to Data Science
PPTX
Business Impact of Predictive Analytics
PPTX
Social Triangulation with Big Data
PPTX
Big Data Science Challenges in Media
PPTX
Hadoop and friends
PPT
IPTV Case Study
Business Change through Predictive Analytics
Business Change through Predictive Analytics
Data Innovation in the UK
Data Disruption by Vertical Innovation in Media
Catalysing Sector Advantage
Rise of the Machines
Health Innovation and the Digital Catapult
Connected Farms ...and the Digital Catapult
Data Innovation in the Digital Economy
Disruptive Data in Future Care
Data Warehouse to Data Science
Business Impact of Predictive Analytics
Social Triangulation with Big Data
Big Data Science Challenges in Media
Hadoop and friends
IPTV Case Study

Recently uploaded (20)

PDF
The influence of sentiment analysis in enhancing early warning system model f...
PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PPTX
Modernising the Digital Integration Hub
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
4 layer Arch & Reference Arch of IoT.pdf
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PDF
A proposed approach for plagiarism detection in Myanmar Unicode text
PPTX
Build Your First AI Agent with UiPath.pptx
PDF
Improvisation in detection of pomegranate leaf disease using transfer learni...
PDF
STKI Israel Market Study 2025 version august
PPTX
Training Program for knowledge in solar cell and solar industry
PDF
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
PPT
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
PPT
Geologic Time for studying geology for geologist
PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
PPTX
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
PPTX
Microsoft Excel 365/2024 Beginner's training
The influence of sentiment analysis in enhancing early warning system model f...
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
Custom Battery Pack Design Considerations for Performance and Safety
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Modernising the Digital Integration Hub
Module 1.ppt Iot fundamentals and Architecture
Taming the Chaos: How to Turn Unstructured Data into Decisions
4 layer Arch & Reference Arch of IoT.pdf
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
A proposed approach for plagiarism detection in Myanmar Unicode text
Build Your First AI Agent with UiPath.pptx
Improvisation in detection of pomegranate leaf disease using transfer learni...
STKI Israel Market Study 2025 version august
Training Program for knowledge in solar cell and solar industry
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
Geologic Time for studying geology for geologist
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
Microsoft Excel 365/2024 Beginner's training

Steps to the Big Data Science Epiphany

  • 1. BIG DATA SCIENCE “The price of light is far less than the cost of darkness” Chandan Rajah [ @ChandanRajah ]
  • 2. BENEFITS OF BIG DATA COST SPEED AGILITY CAPABILITY
  • 3. Steps to the EPIPHANY WHERE WHAT WHY DEMO
  • 4. What is Big Data ? Big Data ≠ Data Volume Big Data = Crude Oil Think of data like ‘Crude Oil’ Big Data is about extracting ‘crude oil’; transporting it in ‘pipelines’; storing it in ‘mega tanks’
  • 5. What is Data Science ? Data Science ≠ Statistical Analysis Data Science = Oil Refinery Data science is about ‘treating’ data; applying ‘science’ to the data; Refine the data ‘results’; and combine to form ‘insight’
  • 6. Knowns, Unknowns & DIKUW FTW! known knowns we know we know known unknowns we know we don’t know unknown unknowns we don’t know we don’t know D DATA I INFORMATION K KNOWLEDGE W WISDOM U UNDERSTANDING raw what how to why when numbers description experience cause & effect prediction letters context tested proven what’s best symbols relationship instruction signals reports programs models PAST FUTURE Data Engineer Data Analyst Data Miner Data Scientist known knowns known unknowns unknown unknowns
  • 7. Data Analytics to Data Discovery ? data you know data you don’t know questionsyou’reasking questionsyou’renotasking Data Analyst Data Scientist Data Analytics Data Discovery DATA MODELLING Y  F( X, random noise, parameters) ALGORITHMIC MODELLING Y  [ BLACK BOX ]  X
  • 8. DIVIDE SCATTER Split Data in Block Replicate and Store Petabytes of Resilience CONQUER EXPLORE 1000s of Parallel Threads Explore Every Path Machine Learning INSIGHT GATHER Real Time Action Periodic Dashboards Iterative Evolution What is the Big Idea ?
  • 9. Divide = HDFS Name Node 1 32 Client 1. Create Metadata 2. Put Blocks Data Nodes Control / Monitoring 1 1 2 2 3 3 WRITE Name Node 1 1 1 2 2 2 3 3 34 4 4 Client 1. Get Metadata 2. Fetch Blocks Data Nodes Control / Monitoring READ
  • 12. Steps to the EPIPHANY WHERE WHAT WHY DEMO
  • 13. Why is Big Data needed ? VOLUME VELOCITY VARIETY Exponential growth; 2x in 2 yrs PB (1000 TB) is now common Event streams; never at rest 640k GB per internet minute 100s of data sources 85% not in a table
  • 14. Where in the Value Chain ? Generation Transport Knowledge Output Value BIG DATA SCIENCE Straddles all four Challenge Areas
  • 15. Steps to the EPIPHANY WHERE WHAT WHY DEMO
  • 16. Big Data Heat Map – Gartner 2012
  • 17. Big Data Potential by Sector – McKinsey for USBLS, 2011
  • 18. Big Data Investment by Industry – Gartner, 2012
  • 19. Top Big Data Challenges – Gartner, 2012
  • 20. Survey on Big Data Investments – IDG Survey, 2013
  • 21. Survey on Main Drivers to Invest – IDG Survey, 2014
  • 22. Steps to the EPIPHANY WHERE WHAT WHY DEMO
  • 23. DEMO
  • 24. RECAP OF BENEFITS COST SPEED AGILITY CAPABILITY
  • 25. LAST WORDS OF WISDOM NOT ALL ROADS LEAD TO ROME TIME VALUE OF DATA KNOWLEDGE IS POWER I AM AN INDIVIDUAL
  • 26. “The price of light is far less than the cost of darkness”

Editor's Notes

  • #3: COST – 20x less per TB v/s Teradata, Netezza, Oracle – 75% less average marginal cost per capacity SPEED – 10x faster than Teradata, Netezza AGILITY – 115% lesser average cost per data source v/s Oracle SCIENCE – Machine learning, prediction
  • #4: WHAT - What is Big Data Science? WHY - Why is it needed? WHERE - Where is it being used? HOW - How will it evolve?
  • #13: WHAT - What is Big Data Science? WHY - Why is it needed? WHERE - Where is it being used? HOW - How will it evolve?
  • #16: WHAT - What is Big Data Science? WHY - Why is it needed? WHERE - Where is it being used? HOW - How will it evolve?
  • #23: WHAT - What is Big Data Science? WHY - Why is it needed? WHERE - Where is it being used? HOW - How will it evolve?
  • #25: COST – 20x less per TB v/s Teradata, Netezza, Oracle – 75% less average marginal cost per capacity SPEED – 10x faster than Teradata, Netezza AGILITY – 115% lesser average cost per data source v/s Oracle SCIENCE – Machine learning, prediction
  • #26: TIME VALUE - Yesterday’s data is less valuable than today’s data - Historical data is more valuable than just now alone POWER - Get from unknown unknowns to known unknowns or known knowns is powerful LEAD TO ROME - Exploring with no direct business impact is not a bad thing INDIVUDUAL - Treat every customer as an individual not an aggregate and analyse - Aggregate only individual insights