SlideShare a Scribd company logo
Big Data Analytics
What Is Big Data Analytics?
● Big Data
– Buzz word
– Two definitions:
● Data sets too large for modern relational databases
● Semi-structured/Unstructured data sets
● Analytics
– The science of measuring and discovering patterns
and trends with data
Big Data Analytics - Introduction
Source: http://www.socialtalent.co/blog/big-data-whats-the-big-deal
Data, Data, Everywhere...
● In 2004:
– Internet traffic: 1 Exabyte (that's 134,217,728 8GB
flash drives)
– A lot of other media:
● Newspapers/books/magazines
● DVDs
Data, Data, Everywhere...
● Today:
– Internet traffic: 1.3 Zettabytes (that's
178,670,639,360 8 GB sticks)
● 110.3 exabytes per month
– Even more media:
● Mobile devices (phones/tablets/mp3 players/etc)
● The Internet of Things
● Streaming Media
The Internet of Things
● How many of you have...
– Fitness trackers?
– E-readers?
– Ipods?
● Tie them to social sites (i.e. Facebook)?
The Internet of Things
● You're being tracked!
● So what?
– Marketing
– Medical
– Government
● Building fuller picture of what's tracked.
Social Network Integration
Six Degrees of Separation
Source: http://www.83toinfinity.com
Source: http://www.math.cornell.edu/~numb3rs/blanco/social_net.jpg
Data Storage
Data Storage
● Relational Databases
– Structured data
– Can scale to huge volumes of data
● Hadoop
– Semi-structured/unstructured data
– Massively parallel storage and processing
Relational Database
Source: http://www.ntu.edu.sg/home/ehchua/programming/sql/images/ManyToOne.png
Unstructured Data
Source: http://storagegaga.com/2011/12/
Semi-structured
Source: http://www.stylusstudio.com/images/figures/sql_xml_xml_fragment.gif
What Solution to Pick?
● Data Volume and Speed
– Relational Databases Will Cap out
– ”Big Data” Stores Scale (For Now)
● Hadoop
● Spark
● Lucene
– Alternative Modeling Techniques
● Hyper Normalized (6-8NF)
– Inmon's Textual Disambiguation
– Anchor Modeling
– Data Vault
Big Data Analytics - Introduction
Hadoop
● Version 1
– Giant data store
– File distribution
– File parsing tools
– Generic security
● Version 2
– Giant data store
– Replaced foundation work
– Unified security -LDAP/Kerberos support
Tools
● Oozie
● Hive
● NoSQL Databases
– Hbase
– MongoDB
JSON
{
"employees": [
{ "firstName":"John" , "lastName":"Doe" },
{ "firstName":"Anna" , "lastName":"Smith" },
{ "firstName":"Peter" , "lastName":"Jones" }
]
}
Source: http://www.w3schools.com/json/json_syntax.asp
How to Analyze?
● Performance
● Timeliness
● Accuracy
● Feedback
“Big Data” Solutions
● Search the entire data set
● Great performance
● Highly accurate
● Integrates into Analytics tools
– Only some of the tools are able to support Hadoop,
etc.
Statistics
● Designed for all sizes of data sets
● Decreases time to results
● As accurate as needed
● Analytics tools fully support
● Most “Big Data” tools support
Analytics Tools
● Can access data of most sizes
– Most can handle Hadoop and some NoSQL
databases
● Built for Predictive Modeling
● Starting to handle social/network modeling
How to Get Started
● Grab some tools!
– RapidMiner (http://rapidminer.com/)
– R (http://www.r-project.org/)
– Weka (http://www.cs.waikato.ac.nz/ml/weka/)
● Grab some data!
– http://www.kdnuggets.com/datasets/index.html
– http://aws.amazon.com/publicdatasets/
– http://www.reddit.com/r/datasets
Prizes/Challenges
● Kaggle - https://www.kaggle.com/
● MIT - http://bigdata.csail.mit.edu/challenge
● Heritage Health Prize -
http://www.heritagehealthprize.com/c/hhp
● Twitter -
@OpenDataAlex
● LinkedIn –
alexmeadows
● Github - dbaAlex
Questions? Comments?

More Related Content

What's hot (20)

PPTX
Big Data Analytics
Tyrone Systems
 
PPTX
Tools and Methods for Big Data Analytics by Dahl Winters
Melinda Thielbar
 
PPT
Big Tools for Big Data
Lewis Crawford
 
PPTX
Introduction to Big Data
Srinath Perera
 
PPSX
Big Data
Neha Mehta
 
PPTX
Big Data Analysis Patterns - TriHUG 6/27/2013
boorad
 
PPTX
Bigdata
Saravanan Manoharan
 
PPTX
Big data unit 2
RojaT4
 
PPTX
Big Data Hadoop
Techsparks
 
PPTX
Big data deep learning: applications and challenges
fazail amin
 
DOCX
Big data abstract
nandhiniarumugam619
 
PPTX
Are you ready for BIG DATA?
Putchong Uthayopas
 
PDF
The evolution of data analytics
Natalino Busa
 
PPTX
Big Data & Data Science
BrijeshGoyani
 
PPTX
Exploring Big Data Analytics Tools
Multisoft Virtual Academy
 
PDF
Big data Big Analytics
Ajay Ohri
 
PDF
Introduction to Big Data
Haluan Irsad
 
PPT
Overview of Bigdata Analytics
Sankarapu Anjaneyulu
 
PPTX
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Dez Blanchfield
 
PPTX
big data overview ppt
VIKAS KATARE
 
Big Data Analytics
Tyrone Systems
 
Tools and Methods for Big Data Analytics by Dahl Winters
Melinda Thielbar
 
Big Tools for Big Data
Lewis Crawford
 
Introduction to Big Data
Srinath Perera
 
Big Data
Neha Mehta
 
Big Data Analysis Patterns - TriHUG 6/27/2013
boorad
 
Big data unit 2
RojaT4
 
Big Data Hadoop
Techsparks
 
Big data deep learning: applications and challenges
fazail amin
 
Big data abstract
nandhiniarumugam619
 
Are you ready for BIG DATA?
Putchong Uthayopas
 
The evolution of data analytics
Natalino Busa
 
Big Data & Data Science
BrijeshGoyani
 
Exploring Big Data Analytics Tools
Multisoft Virtual Academy
 
Big data Big Analytics
Ajay Ohri
 
Introduction to Big Data
Haluan Irsad
 
Overview of Bigdata Analytics
Sankarapu Anjaneyulu
 
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Dez Blanchfield
 
big data overview ppt
VIKAS KATARE
 

Viewers also liked (16)

ODP
Continuous integration with business intelligence and analytics
Alex Meadows
 
PDF
Introduction to Big Data Analytics and Data Science
Data Science Thailand
 
PDF
4 Big Analytic Types That You Should Know By Wayne Chen
Wayne Chen
 
PPTX
Learning Analytics Medea Webinar, part 1
erikwoning
 
PPTX
Introduction to Big Data & Analytics
Prasad Chitta
 
PPTX
G finals
Aakash Roy
 
DOCX
Big data lecture notes
Mohit Saini
 
PPT
Automated Testing vs Manual Testing
Directi Group
 
PPTX
Chemathlon 2016 finals
Aakash Roy
 
PPTX
Chemathlon 2016
Aakash Roy
 
PDF
Introduction to Test Automation
Pekka Klärck
 
PDF
Introduction to Data Mining and Big Data Analytics
Big Data Engineering, Faculty of Engineering, Dhurakij Pundit University
 
PPTX
Predictive Analytics - An Overview
MachinePulse
 
PDF
8 Ways to Personalize Your App (in Under 30 Minutes)
Localytics
 
PPTX
What is Big Data?
Bernard Marr
 
PPTX
Big Data - 25 Amazing Facts Everyone Should Know
Bernard Marr
 
Continuous integration with business intelligence and analytics
Alex Meadows
 
Introduction to Big Data Analytics and Data Science
Data Science Thailand
 
4 Big Analytic Types That You Should Know By Wayne Chen
Wayne Chen
 
Learning Analytics Medea Webinar, part 1
erikwoning
 
Introduction to Big Data & Analytics
Prasad Chitta
 
G finals
Aakash Roy
 
Big data lecture notes
Mohit Saini
 
Automated Testing vs Manual Testing
Directi Group
 
Chemathlon 2016 finals
Aakash Roy
 
Chemathlon 2016
Aakash Roy
 
Introduction to Test Automation
Pekka Klärck
 
Introduction to Data Mining and Big Data Analytics
Big Data Engineering, Faculty of Engineering, Dhurakij Pundit University
 
Predictive Analytics - An Overview
MachinePulse
 
8 Ways to Personalize Your App (in Under 30 Minutes)
Localytics
 
What is Big Data?
Bernard Marr
 
Big Data - 25 Amazing Facts Everyone Should Know
Bernard Marr
 
Ad

Similar to Big Data Analytics - Introduction (20)

PPT
Big data
Bhuvana Patt
 
PPTX
BigData
Viveka Sharma
 
PPTX
Big Data Analytics MIS presentation
AASTHA PANDEY
 
PPT
Data analytics & its Trends
Dr.K.Sreenivas Rao
 
PPT
Big data analytics, survey r.nabati
nabati
 
PPTX
Data analytics introduction
amiyadash
 
PPTX
big data processing.pptx
ssuser96aab9
 
PPTX
Big data analytics: Technology's bleeding edge
Bhavya Gulati
 
PPTX
Big data Analytics Unit - CCS334 Syllabus
Sunanthini Rajkumar
 
PPTX
Big Data - An Overview
Arvind Kalyan
 
PPTX
selected topics in CS-CHaaapteerobe.pptx
BachaLamessaa
 
PDF
Big Data Science Workshop Documentation V1.0
Abdelrahman Astro
 
PPTX
Intro big data analytics
Hagar Alaa el-din
 
PPTX
Data mining with big data implementation
Sandip Tipayle Patil
 
PPTX
Big data Analytics Fundamentals Chapter 1
karpagavalli38
 
PPTX
Big data
Joseph Sebastian
 
PDF
A REVIEW PAPER ON BIG DATA ANALYTICS
Sarah Adams
 
PPTX
INTRODUCTION TO BIG DATA AND HADOOP
Dr Geetha Mohan
 
PPTX
Bigdata and Hadoop with applications
Padma Metta
 
PDF
@vtucode.in-21CS71-module-1-pdf.pdfBig data
sanjanakorawar
 
Big data
Bhuvana Patt
 
BigData
Viveka Sharma
 
Big Data Analytics MIS presentation
AASTHA PANDEY
 
Data analytics & its Trends
Dr.K.Sreenivas Rao
 
Big data analytics, survey r.nabati
nabati
 
Data analytics introduction
amiyadash
 
big data processing.pptx
ssuser96aab9
 
Big data analytics: Technology's bleeding edge
Bhavya Gulati
 
Big data Analytics Unit - CCS334 Syllabus
Sunanthini Rajkumar
 
Big Data - An Overview
Arvind Kalyan
 
selected topics in CS-CHaaapteerobe.pptx
BachaLamessaa
 
Big Data Science Workshop Documentation V1.0
Abdelrahman Astro
 
Intro big data analytics
Hagar Alaa el-din
 
Data mining with big data implementation
Sandip Tipayle Patil
 
Big data Analytics Fundamentals Chapter 1
karpagavalli38
 
A REVIEW PAPER ON BIG DATA ANALYTICS
Sarah Adams
 
INTRODUCTION TO BIG DATA AND HADOOP
Dr Geetha Mohan
 
Bigdata and Hadoop with applications
Padma Metta
 
@vtucode.in-21CS71-module-1-pdf.pdfBig data
sanjanakorawar
 
Ad

More from Alex Meadows (16)

PPTX
Ethics In A Data Driven World
Alex Meadows
 
PDF
SIM RTP Meeting - So Who's Using Open Source Anyway?
Alex Meadows
 
ODP
Introduction To Data Warehousing
Alex Meadows
 
ODP
Continuous Integration As A Service
Alex Meadows
 
ODP
Building next generation data warehouses
Alex Meadows
 
PPTX
How Linked Data Can Speed Information Discovery
Alex Meadows
 
ODP
Graphing Your Data
Alex Meadows
 
ODP
Introduction To Analytics
Alex Meadows
 
PDF
Big Data Pitfalls
Alex Meadows
 
PDF
Open Source BI Overview
Alex Meadows
 
PDF
Agile Business Intelligence
Alex Meadows
 
ODP
Open source data_warehousing_overview
Alex Meadows
 
ODP
Data quality overview
Alex Meadows
 
ODP
Mondrian and OLAP Overview
Alex Meadows
 
ODP
Open Source Business Intelligence Overview
Alex Meadows
 
ODP
Choosing the right steps in pentaho kettle
Alex Meadows
 
Ethics In A Data Driven World
Alex Meadows
 
SIM RTP Meeting - So Who's Using Open Source Anyway?
Alex Meadows
 
Introduction To Data Warehousing
Alex Meadows
 
Continuous Integration As A Service
Alex Meadows
 
Building next generation data warehouses
Alex Meadows
 
How Linked Data Can Speed Information Discovery
Alex Meadows
 
Graphing Your Data
Alex Meadows
 
Introduction To Analytics
Alex Meadows
 
Big Data Pitfalls
Alex Meadows
 
Open Source BI Overview
Alex Meadows
 
Agile Business Intelligence
Alex Meadows
 
Open source data_warehousing_overview
Alex Meadows
 
Data quality overview
Alex Meadows
 
Mondrian and OLAP Overview
Alex Meadows
 
Open Source Business Intelligence Overview
Alex Meadows
 
Choosing the right steps in pentaho kettle
Alex Meadows
 

Recently uploaded (20)

PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 

Big Data Analytics - Introduction