SlideShare a Scribd company logo
Big Data - Module 1
WHAT IS BIG DATA?
What is Big Data?
• There are humungous amount of data, available which have a
lot of meaningful insights – they need to be analysed
• Existing Online Transaction Processing (OLTP) and Business
Intelligence (BI) are not easily scalable considering cost, effort,
and manageability aspect.
• It is not just volume, but also the variety and velocity of data.
• Big data is a terminology that refers to challenges that we are
facing due to exponential volume, variety and velocity of data.
Three V’s of Big Data
Three V’s of Big Data
Big Data - Module 1
THE CHALLENGE
Background
Shorter Time to React
• Data that enters your organization and has some kind of value
for a limited window of time
• This window usually shuts well before the data has been
transformed and loaded into a data warehouse for deeper
analysis.
• The higher the volumes of data entering your organization per
second, the bigger your challenge.
Data Economics
• Why Volume is good ?
– No individual record is particularly valuable
– Having every record is incredibly valuable
• Why storage decision is important ?
• How much value can I extract from every byte of data verses
the cost to save that data?
– If value > cost – then keep it online, on DB or filer
– If cost > value – I discard it or archive on tape (expensive to
throw data)
Data Storage
Schema Structured Un Structured
Storage Medium RDBMS Filers
Storage Reliability Very reliable Very reliable
Processing ability Very reliable unstructured schema
poses challenges
Location of
processing
SQL queries pull data
to server
Random means to
retrieve sense
Impact of data
increase
Cost increases
linearly
Cost increases
linearly
Support for Big Data No No
BIG DATA’S APPROACH
Big Data Approach
Big Data refer to
technologies that
can capture, process
and analyze data.
No SQL Database Types
• Key-value store
– Key can be custom or auto generated
– Value can be complex objects like XML, BLOB, JSON
etc
– Popular : DynamoDB, Azure Table Store (ATS), Riak
• Column store
– Data is stored as families of columns; high scalability
with very high performance architecture
– Examples : HBase, Cassandra, Vertica and Hypertable
No SQL Database Types
• Document database
– Designed to store, retrieve & manage document
oriented information; expands on key-value store
– Example: MongoDB, CouchDB
• Graph database
– Designed for data that whose relations are well
represented in graphs, usually with nodes
connected to edges
– Examples : Neo4J and Polyglot
Analytical Database
• An analytical database is a type of database built to store,
manage, and consume big data.
• Optimized for processing advanced analytics that involves
highly complex queries on terabytes of data and complex
statistical processing, data mining, and NLP (natural language
processing).
• Examples of analytical databases are Vertica (acquired by HP),
Aster Data (acquired by Teradata), Greenplum (acquired by
EMC), and so on.
BIG DATA USE CASE
PATTERNS
Preprocess & Store
• Scenario
– Data getting continuously generated in large volume
– Need to pre-process before loading into target systems
Real Time Actions
• Scenario
– Manage actions to be taken
on continuously changing
data in real time
Credit Card Issuer
Sears – Competes on Big Data
• They have data of over 100 million customers, which they
analyse to offer real-time, relevant offers to their customers.
• The solution was 3 years in the making, which included
programming that would capture, analyze, and report on
customer activity at an individual level, across all 4,000
locations.
• Sears has a Hadoop cluster of 300-nodes that is populated
with over 2 petabytes of structure customer transaction data,
sales data and supply chain data.
• Results: Sears achieved an active member base in the 8 digits,
exceeding the projected 36 month membership target in 17
months.
THE FUTURE OF BIG
DATA
Compound Annual Growth Rate
IDC Report Analysis
Careers in Big Data
THE END
Next : Hadoop

More Related Content

What's hot (20)

PPTX
data warehousing and data mining
E2MATRIX
 
PDF
Data Warehousing
Karthik Srini B R
 
PPT
Data mining techniques unit 1
malathieswaran29
 
PPT
Data Warehouse and Data Mining
Ranak Ghosh
 
PDF
WSO2 BAM - Your Big Data Toolbox
WSO2
 
PPTX
Introduction to BIG DATA
Zeeshan Khan
 
PPTX
DATA MART APPROCHES TO ARCHITECTURE
Sachin Batham
 
PPT
Data Warehousing and Mining
ethantelaviv
 
PPTX
SoftServe BI/BigData Workshop in Utah
Serhiy (Serge) Haziyev
 
PDF
"Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wo...
Comsysto Reply GmbH
 
ODP
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
Romeo Kienzler
 
PDF
Data warehousing
Matouš Havlena
 
PPTX
1 PSUT Big Data Class, introduction
Akram Al-Kouz
 
PPT
Gulabs Ppt On Data Warehousing And Mining
gulab sharma
 
ODP
Data warehouse inmon versus kimball 2
Mike Frampton
 
PPT
Data Mining and Data Warehousing
Amdocs
 
PPT
DATA WAREHOUSING AND DATA MINING
Lovely Professional University
 
PDF
What is bi analytics and big data
galiasisense
 
PPTX
Intro to Data warehousing lecture 16
AnwarrChaudary
 
PPTX
Data Mining - The Big Picture!
Khalid Salama
 
data warehousing and data mining
E2MATRIX
 
Data Warehousing
Karthik Srini B R
 
Data mining techniques unit 1
malathieswaran29
 
Data Warehouse and Data Mining
Ranak Ghosh
 
WSO2 BAM - Your Big Data Toolbox
WSO2
 
Introduction to BIG DATA
Zeeshan Khan
 
DATA MART APPROCHES TO ARCHITECTURE
Sachin Batham
 
Data Warehousing and Mining
ethantelaviv
 
SoftServe BI/BigData Workshop in Utah
Serhiy (Serge) Haziyev
 
"Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wo...
Comsysto Reply GmbH
 
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
Romeo Kienzler
 
Data warehousing
Matouš Havlena
 
1 PSUT Big Data Class, introduction
Akram Al-Kouz
 
Gulabs Ppt On Data Warehousing And Mining
gulab sharma
 
Data warehouse inmon versus kimball 2
Mike Frampton
 
Data Mining and Data Warehousing
Amdocs
 
DATA WAREHOUSING AND DATA MINING
Lovely Professional University
 
What is bi analytics and big data
galiasisense
 
Intro to Data warehousing lecture 16
AnwarrChaudary
 
Data Mining - The Big Picture!
Khalid Salama
 

Similar to Big Data - Module 1 (20)

PDF
Bigdatappt 140225061440-phpapp01
nayanbhatia2
 
DOCX
Content1. Introduction2. What is Big Data3. Characte.docx
dickonsondorris
 
PPTX
What is big data
mintubutani2212
 
PPTX
Big data ppt
Nasrin Hussain
 
DOCX
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docx
tangyechloe
 
PPTX
Special issues on big data
Vedanand Singh
 
PPTX
BigData.pptx
vidhi171881
 
PPTX
Big Data
Rohit Jain
 
PPTX
Presentation on Big Data
Md. Salman Ahmed
 
PDF
Big data - what, why, where, when and how
bobosenthil
 
PPTX
Big data Analytics
Guduru Lakshmi Kiranmai
 
PDF
All About Big Data
Sai Venkatesh
 
PPTX
Introduction to Big Data
Akshata Humbe
 
PPTX
Big Data, NoSQL, NewSQL & The Future of Data Management
Tony Bain
 
PPTX
Big data Presentation
Aswadmehar
 
PPSX
Big data with Hadoop - Introduction
Tomy Rhymond
 
PPTX
Big data
Mahmudul Alam
 
PPTX
Big data
Enfa George
 
PPTX
ppt final.pptx
kalai75
 
PPTX
Big Data ppt
Vivek Gautam
 
Bigdatappt 140225061440-phpapp01
nayanbhatia2
 
Content1. Introduction2. What is Big Data3. Characte.docx
dickonsondorris
 
What is big data
mintubutani2212
 
Big data ppt
Nasrin Hussain
 
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docx
tangyechloe
 
Special issues on big data
Vedanand Singh
 
BigData.pptx
vidhi171881
 
Big Data
Rohit Jain
 
Presentation on Big Data
Md. Salman Ahmed
 
Big data - what, why, where, when and how
bobosenthil
 
Big data Analytics
Guduru Lakshmi Kiranmai
 
All About Big Data
Sai Venkatesh
 
Introduction to Big Data
Akshata Humbe
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Tony Bain
 
Big data Presentation
Aswadmehar
 
Big data with Hadoop - Introduction
Tomy Rhymond
 
Big data
Mahmudul Alam
 
Big data
Enfa George
 
ppt final.pptx
kalai75
 
Big Data ppt
Vivek Gautam
 
Ad

Big Data - Module 1

  • 2. WHAT IS BIG DATA?
  • 3. What is Big Data? • There are humungous amount of data, available which have a lot of meaningful insights – they need to be analysed • Existing Online Transaction Processing (OLTP) and Business Intelligence (BI) are not easily scalable considering cost, effort, and manageability aspect. • It is not just volume, but also the variety and velocity of data. • Big data is a terminology that refers to challenges that we are facing due to exponential volume, variety and velocity of data.
  • 4. Three V’s of Big Data
  • 5. Three V’s of Big Data
  • 9. Shorter Time to React • Data that enters your organization and has some kind of value for a limited window of time • This window usually shuts well before the data has been transformed and loaded into a data warehouse for deeper analysis. • The higher the volumes of data entering your organization per second, the bigger your challenge.
  • 10. Data Economics • Why Volume is good ? – No individual record is particularly valuable – Having every record is incredibly valuable • Why storage decision is important ? • How much value can I extract from every byte of data verses the cost to save that data? – If value > cost – then keep it online, on DB or filer – If cost > value – I discard it or archive on tape (expensive to throw data)
  • 11. Data Storage Schema Structured Un Structured Storage Medium RDBMS Filers Storage Reliability Very reliable Very reliable Processing ability Very reliable unstructured schema poses challenges Location of processing SQL queries pull data to server Random means to retrieve sense Impact of data increase Cost increases linearly Cost increases linearly Support for Big Data No No
  • 13. Big Data Approach Big Data refer to technologies that can capture, process and analyze data.
  • 14. No SQL Database Types • Key-value store – Key can be custom or auto generated – Value can be complex objects like XML, BLOB, JSON etc – Popular : DynamoDB, Azure Table Store (ATS), Riak • Column store – Data is stored as families of columns; high scalability with very high performance architecture – Examples : HBase, Cassandra, Vertica and Hypertable
  • 15. No SQL Database Types • Document database – Designed to store, retrieve & manage document oriented information; expands on key-value store – Example: MongoDB, CouchDB • Graph database – Designed for data that whose relations are well represented in graphs, usually with nodes connected to edges – Examples : Neo4J and Polyglot
  • 16. Analytical Database • An analytical database is a type of database built to store, manage, and consume big data. • Optimized for processing advanced analytics that involves highly complex queries on terabytes of data and complex statistical processing, data mining, and NLP (natural language processing). • Examples of analytical databases are Vertica (acquired by HP), Aster Data (acquired by Teradata), Greenplum (acquired by EMC), and so on.
  • 17. BIG DATA USE CASE PATTERNS
  • 18. Preprocess & Store • Scenario – Data getting continuously generated in large volume – Need to pre-process before loading into target systems
  • 19. Real Time Actions • Scenario – Manage actions to be taken on continuously changing data in real time
  • 21. Sears – Competes on Big Data • They have data of over 100 million customers, which they analyse to offer real-time, relevant offers to their customers. • The solution was 3 years in the making, which included programming that would capture, analyze, and report on customer activity at an individual level, across all 4,000 locations. • Sears has a Hadoop cluster of 300-nodes that is populated with over 2 petabytes of structure customer transaction data, sales data and supply chain data. • Results: Sears achieved an active member base in the 8 digits, exceeding the projected 36 month membership target in 17 months.
  • 22. THE FUTURE OF BIG DATA
  • 23. Compound Annual Growth Rate IDC Report Analysis
  • 25. THE END Next : Hadoop