SlideShare a Scribd company logo
© 2016 Ness SES. All Rights Reserved1
BIG DATA
Open Source Projects
vs
Amazon Services
MOLDOVAN Radu Adrian
Iasi May 2016
© 2016 Ness SES. All Rights Reserved2
Who am I? :)
❏ passionate about technology
❏ 20 years of programming
using open source
❏ last 4 years in Big Data
❏ Big Data Architect @
© 2016 Ness SES. All Rights
Reserved
3
… where Enterprise ends and Big Data starts
www.XYZ.com
Load 1
Balancer
Load n
Balancer
Web 1.1
Server
Web 1.x
Server
Web n.1
Server
Web n.x
Server
Database
search
index
Cache
← Single Point of Failure
← Limited Scalability
read read
writewrite
© 2016 Ness SES. All Rights
Reserved
4
… where Enterprise ends and Big Data starts
www.XYZ.com
Load 1
Balancer
Load n
Balancer
Web 1.1
Server
Web 1.x
Server
Web n.1
Server
Web n.x
Server
readwrite read write
noSQL Ring
1 2
4 5
3
search
1 2
3 4
n
DFS
Resource
Manager
1
HDD
s
CPU
RAM
2
HDD
s
CPU
RAM
n
HDD
s
CPU
RAM
DFS
MPP
RES.
MANAGER
© 2016 Ness SES. All Rights Reserved5
INFRASTRUCTURE LAYER
Database
Analytics
Bigdata
INFORMATION LAYER
MULTI CHANNEL DELIVERY
Dashboard Laptop Mobile/Tablet Email SMS Print
ANALYTICS LAYER
Realtime
Near Realtime
Reports + Statistics Custom Tools
Data Processing
- system generated data
- dimensional data
- de/normalize data
Data Ingestion/Extraction
- external data
- reference internal data
- discovery data
Data Loading
- operational data
- business information
data
Architecture - High Level
© 2016 Ness SES. All Rights
Reserved
6
Big data -ETL+BI
ERP
Flat
Files
CRM
Live
Stream
RDBMS
Web
Services
Extract Transform Load
Massive
Parallel
Processing
Distributed
System
noSQL DB
warehouse
DB(OLAP)
search
engines
Business Intelligence
Web
Services
Data
Science
Data
Monetization
Data
Exploration
Data
Visualisation
ETL BI
© 2016 Ness SES. All Rights Reserved7
CONSISTENCY
(quorum)
AVAILABILITY
PARTITIONING
RDBMS
HP Vertica(Columnar)
Cassandra (Columnar)
Dynamo (Key-Value)
Couchbase(Document)
Riak (Document)
HDFS
HBase (Columnar)
MongoDB (Document)
Redis (Key-Value)
Memcached(Key-Value)
2
CAP Theorem
© 2016 Ness SES. All Rights Reserved8
Coordinator
ZooKeeper
Management
Ambari
Workflow
Oozie
???NiFi
Security
Ranger+Knox+Falcon
Kerberos
LDAP
Cluster ecosystem - components
Monitoring
Ganglia Nagios
Logs
Kibana
Logstash
© 2016 Ness SES. All Rights Reserved9
COLLECT PROCESS STORE VISUALIZE
Cluster ecosystem - COLLECT
Data Integration
Talend
Informatica
Data Streaming
Storm,
MapR Streams
Spark Streaming
Flink Stream
Data Aggregation
Flume, Scribe
Msg Brokers +
Streams
RabbitMQ
ActiveMQ
Kafka
Data Loader
Sqoop
Data Governance
Atlas
Amazon Simple Queue Service(SQS)
Amazon Kinesis
© 2016 Ness SES. All Rights Reserved10
HADOOP (HDFS)
Res. Manager
Mesos
Yarn
MapReduce
PIG
Analytics
Impala(Drill) GRAPHs
Spark GraphX,
Neo4J, Titan
Flink Gelly
HBase
MongoDB
HIVE
COLLECT PROCESS STORE VISUALIZE
Cluster ecosystem - PROCESS
In Memory
Spark
TEZ
Cloudera, Hortonworks, MapR
Amazon DynamoDB
Amazon EC2
Amazon EMR Amazon S3
Amazon Glacier
© 2016 Ness SES. All Rights Reserved11
Warehouse DB
Presto (ANSI)
HP Vertica
Search Engines
SolrCloud
Elastic Search
Columnar Store
Cassandra
Accumulo
Machine
Learning
Spark ML
FlinkML, Mahout
Key - Value
Store
Redis, Riak,
Memcached
COLLECT PROCESS STORE VISUALIZE
Cluster ecosystem - STORE
Amazon Redshift
Amazon DynamoDB
Amazon ElasticCache
Amazon ElasticSearch
Amazon ML
© 2016 Ness SES. All Rights Reserved12
Tableau
COLLECT PROCESS STORE VISUALIZE
Cluster ecosystem - components
Logi
Jasper
Reports
D3
Pentaho*
Crystal
Reports*
© 2016 Ness SES. All Rights Reserved13
HADOOP (HDFS)
Res. Manager
Mesos
Yarn
Warehouse DB
Presto (ANSI)
HP Vertica
MapReduce
PIG
Search Engines
SolrCloud
Elastic Search
Data Integration
Talend
Informatica
Analytics
Columnar Store
Cassandra
Accumulo
Impala(Drill) GRAPHs
Spark GraphX,
Titan, Neo4J
Flink Gelly
Machine
Learning
Spark ML
FlinkML, Mahout
HBase
MongoDB
Data Streaming
Storm,
MapR Streams
Spark Streaming
Flink Stream
HIVE
Tableau
Key - Value
Store
Redis, Riak,
Memcached
Data Aggregation
Flume, Scribe
Msg Brokers +
Streams
RabbitMQ
ActiveMQ
Kafka
COLLECT PROCESS STORE VISUALIZE
Data Loader
Sqoop
Cluster ecosystem - VISUALIZE
In Memory
Spark
TEZ
Cloudera, Hortonworks, MapR
Logi
Jasper
Reports
D3
Pentaho*
Interactiv
e
Reporting
Crystal
Reports
Data Governance
Atlas
© 2016 Ness SES. All Rights Reserved14
Trends - Forbes report Q1 2016
http://www.forbes.com/sites/gilpress/2016/03/14/top-10-hot-big-data-technologies/#7cd07887f26a
© 2016 Ness SES. All Rights Reserved15
Thank you!
Skype: r.moldovan

More Related Content

What's hot (20)

PPTX
Open source big data landscape and possible ITS applications
SoftwareMill
 
PPTX
Sharing bisnis big data v3 part1
Dwika Sudrajat
 
PPTX
Big Data Analytics Projects - Real World with Pentaho
Mark Kromer
 
PPTX
Big Data
ipower softwares
 
PPTX
Sharing bisnis big data v3 part2
Dwika Sudrajat
 
PDF
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
Michael Stack
 
KEY
Cassandra eu
Jeremy Hanna
 
PPTX
Managed Cluster Services
Adam Doyle
 
PPTX
Alexander Pavlenko, Java Software Engineer, DataArt.
Alina Vilk
 
PPT
Big data and hadoop
Prashanth Yennampelli
 
PDF
Presto & differences between popular SQL engines (Spark, Redshift, and Hive)
Holden Ackerman
 
PPTX
IoFMT – Internet of Fleet Management Things
DataWorks Summit
 
PDF
Small intro to Big Data - Old version
SoftwareMill
 
PDF
Enabling Apache Spark for Hybrid Cloud
Alluxio, Inc.
 
PPTX
Big Data - Part IV
Thanuja Seneviratne
 
PPTX
The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.
Data Con LA
 
PPTX
Big Data - Part II
Thanuja Seneviratne
 
PPTX
Big data in Azure
Venkatesh Narayanan
 
PPTX
Big Data - Linked In_DEEPU
Deepu M
 
PPTX
Philly Code Camp 2013 Mark Kromer Big Data with SQL Server
Mark Kromer
 
Open source big data landscape and possible ITS applications
SoftwareMill
 
Sharing bisnis big data v3 part1
Dwika Sudrajat
 
Big Data Analytics Projects - Real World with Pentaho
Mark Kromer
 
Sharing bisnis big data v3 part2
Dwika Sudrajat
 
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
Michael Stack
 
Cassandra eu
Jeremy Hanna
 
Managed Cluster Services
Adam Doyle
 
Alexander Pavlenko, Java Software Engineer, DataArt.
Alina Vilk
 
Big data and hadoop
Prashanth Yennampelli
 
Presto & differences between popular SQL engines (Spark, Redshift, and Hive)
Holden Ackerman
 
IoFMT – Internet of Fleet Management Things
DataWorks Summit
 
Small intro to Big Data - Old version
SoftwareMill
 
Enabling Apache Spark for Hybrid Cloud
Alluxio, Inc.
 
Big Data - Part IV
Thanuja Seneviratne
 
The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.
Data Con LA
 
Big Data - Part II
Thanuja Seneviratne
 
Big data in Azure
Venkatesh Narayanan
 
Big Data - Linked In_DEEPU
Deepu M
 
Philly Code Camp 2013 Mark Kromer Big Data with SQL Server
Mark Kromer
 

Similar to Big data advanced topics - part I (20)

PDF
AWS Summit Singapore 2019 | Opening Keynote with Peter DeSantis
AWS Summits
 
PPTX
Amazon Web Services lection 2
Binary Studio
 
PDF
AWS reInvent 2023 recaps from Chicago AWS user group
AWS Chicago
 
PPTX
AWS re:Invent 2017 re:Cap
Christian Melendez
 
PPTX
AWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATA
SnehaBoja
 
PPTX
AWS cloud computing internship training.pptx
ROHANBANSAL55
 
PDF
Introduction to AWS Services: Compute, Storage,_Databases
daffapunk92
 
PPTX
Innovations and trends in Cloud. Connectfest Porto 2019
javier ramirez
 
PDF
Scaling web application in the Cloud
Federico Feroldi
 
PDF
VTU 6th Sem Elective CSE - Module 5 cloud computing
Sachin Gowda
 
PDF
Big data on_aws in korea by abhishek sinha (lunch and learn)
Amazon Web Services Korea
 
PDF
AWS Summit Seoul 2015 - AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
Amazon Web Services Korea
 
PDF
AWS re:Invent 2016 recap (part 1)
Julien SIMON
 
PDF
AWSomeDayOnline Q322_2. Introduction to AWS Services Compute, Storage, Databa...
SwamiSparthsarathi
 
PPTX
Aws overview
abhijeetrajpurohit29
 
PPTX
Popular Cloud Services- in cloud computing.pptx
DrVASAVIBANDE
 
PPTX
Cloud platforms - Cloud Computing
Aditi Rai
 
PDF
Big Data Architecture Workshop - Vahid Amiri
datastack
 
PDF
Jeff Barr Amazon Services Cloud Computing
deimos
 
KEY
Developing Social Games in the Cloud
Jurriaan Persyn
 
AWS Summit Singapore 2019 | Opening Keynote with Peter DeSantis
AWS Summits
 
Amazon Web Services lection 2
Binary Studio
 
AWS reInvent 2023 recaps from Chicago AWS user group
AWS Chicago
 
AWS re:Invent 2017 re:Cap
Christian Melendez
 
AWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATA
SnehaBoja
 
AWS cloud computing internship training.pptx
ROHANBANSAL55
 
Introduction to AWS Services: Compute, Storage,_Databases
daffapunk92
 
Innovations and trends in Cloud. Connectfest Porto 2019
javier ramirez
 
Scaling web application in the Cloud
Federico Feroldi
 
VTU 6th Sem Elective CSE - Module 5 cloud computing
Sachin Gowda
 
Big data on_aws in korea by abhishek sinha (lunch and learn)
Amazon Web Services Korea
 
AWS Summit Seoul 2015 - AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
Amazon Web Services Korea
 
AWS re:Invent 2016 recap (part 1)
Julien SIMON
 
AWSomeDayOnline Q322_2. Introduction to AWS Services Compute, Storage, Databa...
SwamiSparthsarathi
 
Aws overview
abhijeetrajpurohit29
 
Popular Cloud Services- in cloud computing.pptx
DrVASAVIBANDE
 
Cloud platforms - Cloud Computing
Aditi Rai
 
Big Data Architecture Workshop - Vahid Amiri
datastack
 
Jeff Barr Amazon Services Cloud Computing
deimos
 
Developing Social Games in the Cloud
Jurriaan Persyn
 
Ad

Recently uploaded (20)

PPTX
quantum computing transition from classical mechanics.pptx
gvlbcy
 
PPTX
Online Cab Booking and Management System.pptx
diptipaneri80
 
PDF
Zero Carbon Building Performance standard
BassemOsman1
 
PPTX
Precedence and Associativity in C prog. language
Mahendra Dheer
 
PPTX
IoT_Smart_Agriculture_Presentations.pptx
poojakumari696707
 
PDF
Jual GPS Geodetik CHCNAV i93 IMU-RTK Lanjutan dengan Survei Visual
Budi Minds
 
PDF
Zero carbon Building Design Guidelines V4
BassemOsman1
 
PDF
20ME702-Mechatronics-UNIT-1,UNIT-2,UNIT-3,UNIT-4,UNIT-5, 2025-2026
Mohanumar S
 
PPTX
Ground improvement techniques-DEWATERING
DivakarSai4
 
PPTX
ETP Presentation(1000m3 Small ETP For Power Plant and industry
MD Azharul Islam
 
PPTX
MT Chapter 1.pptx- Magnetic particle testing
ABCAnyBodyCanRelax
 
PPTX
filteration _ pre.pptx 11111110001.pptx
awasthivaibhav825
 
PDF
SG1-ALM-MS-EL-30-0008 (00) MS - Isolators and disconnecting switches.pdf
djiceramil
 
PDF
STUDY OF NOVEL CHANNEL MATERIALS USING III-V COMPOUNDS WITH VARIOUS GATE DIEL...
ijoejnl
 
PPTX
Introduction to Fluid and Thermal Engineering
Avesahemad Husainy
 
PDF
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
PDF
Introduction to Ship Engine Room Systems.pdf
Mahmoud Moghtaderi
 
PDF
All chapters of Strength of materials.ppt
girmabiniyam1234
 
PPTX
Information Retrieval and Extraction - Module 7
premSankar19
 
PDF
AI-Driven IoT-Enabled UAV Inspection Framework for Predictive Maintenance and...
ijcncjournal019
 
quantum computing transition from classical mechanics.pptx
gvlbcy
 
Online Cab Booking and Management System.pptx
diptipaneri80
 
Zero Carbon Building Performance standard
BassemOsman1
 
Precedence and Associativity in C prog. language
Mahendra Dheer
 
IoT_Smart_Agriculture_Presentations.pptx
poojakumari696707
 
Jual GPS Geodetik CHCNAV i93 IMU-RTK Lanjutan dengan Survei Visual
Budi Minds
 
Zero carbon Building Design Guidelines V4
BassemOsman1
 
20ME702-Mechatronics-UNIT-1,UNIT-2,UNIT-3,UNIT-4,UNIT-5, 2025-2026
Mohanumar S
 
Ground improvement techniques-DEWATERING
DivakarSai4
 
ETP Presentation(1000m3 Small ETP For Power Plant and industry
MD Azharul Islam
 
MT Chapter 1.pptx- Magnetic particle testing
ABCAnyBodyCanRelax
 
filteration _ pre.pptx 11111110001.pptx
awasthivaibhav825
 
SG1-ALM-MS-EL-30-0008 (00) MS - Isolators and disconnecting switches.pdf
djiceramil
 
STUDY OF NOVEL CHANNEL MATERIALS USING III-V COMPOUNDS WITH VARIOUS GATE DIEL...
ijoejnl
 
Introduction to Fluid and Thermal Engineering
Avesahemad Husainy
 
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
Introduction to Ship Engine Room Systems.pdf
Mahmoud Moghtaderi
 
All chapters of Strength of materials.ppt
girmabiniyam1234
 
Information Retrieval and Extraction - Module 7
premSankar19
 
AI-Driven IoT-Enabled UAV Inspection Framework for Predictive Maintenance and...
ijcncjournal019
 
Ad

Big data advanced topics - part I

  • 1. © 2016 Ness SES. All Rights Reserved1 BIG DATA Open Source Projects vs Amazon Services MOLDOVAN Radu Adrian Iasi May 2016
  • 2. © 2016 Ness SES. All Rights Reserved2 Who am I? :) ❏ passionate about technology ❏ 20 years of programming using open source ❏ last 4 years in Big Data ❏ Big Data Architect @
  • 3. © 2016 Ness SES. All Rights Reserved 3 … where Enterprise ends and Big Data starts www.XYZ.com Load 1 Balancer Load n Balancer Web 1.1 Server Web 1.x Server Web n.1 Server Web n.x Server Database search index Cache ← Single Point of Failure ← Limited Scalability read read writewrite
  • 4. © 2016 Ness SES. All Rights Reserved 4 … where Enterprise ends and Big Data starts www.XYZ.com Load 1 Balancer Load n Balancer Web 1.1 Server Web 1.x Server Web n.1 Server Web n.x Server readwrite read write noSQL Ring 1 2 4 5 3 search 1 2 3 4 n DFS Resource Manager 1 HDD s CPU RAM 2 HDD s CPU RAM n HDD s CPU RAM DFS MPP RES. MANAGER
  • 5. © 2016 Ness SES. All Rights Reserved5 INFRASTRUCTURE LAYER Database Analytics Bigdata INFORMATION LAYER MULTI CHANNEL DELIVERY Dashboard Laptop Mobile/Tablet Email SMS Print ANALYTICS LAYER Realtime Near Realtime Reports + Statistics Custom Tools Data Processing - system generated data - dimensional data - de/normalize data Data Ingestion/Extraction - external data - reference internal data - discovery data Data Loading - operational data - business information data Architecture - High Level
  • 6. © 2016 Ness SES. All Rights Reserved 6 Big data -ETL+BI ERP Flat Files CRM Live Stream RDBMS Web Services Extract Transform Load Massive Parallel Processing Distributed System noSQL DB warehouse DB(OLAP) search engines Business Intelligence Web Services Data Science Data Monetization Data Exploration Data Visualisation ETL BI
  • 7. © 2016 Ness SES. All Rights Reserved7 CONSISTENCY (quorum) AVAILABILITY PARTITIONING RDBMS HP Vertica(Columnar) Cassandra (Columnar) Dynamo (Key-Value) Couchbase(Document) Riak (Document) HDFS HBase (Columnar) MongoDB (Document) Redis (Key-Value) Memcached(Key-Value) 2 CAP Theorem
  • 8. © 2016 Ness SES. All Rights Reserved8 Coordinator ZooKeeper Management Ambari Workflow Oozie ???NiFi Security Ranger+Knox+Falcon Kerberos LDAP Cluster ecosystem - components Monitoring Ganglia Nagios Logs Kibana Logstash
  • 9. © 2016 Ness SES. All Rights Reserved9 COLLECT PROCESS STORE VISUALIZE Cluster ecosystem - COLLECT Data Integration Talend Informatica Data Streaming Storm, MapR Streams Spark Streaming Flink Stream Data Aggregation Flume, Scribe Msg Brokers + Streams RabbitMQ ActiveMQ Kafka Data Loader Sqoop Data Governance Atlas Amazon Simple Queue Service(SQS) Amazon Kinesis
  • 10. © 2016 Ness SES. All Rights Reserved10 HADOOP (HDFS) Res. Manager Mesos Yarn MapReduce PIG Analytics Impala(Drill) GRAPHs Spark GraphX, Neo4J, Titan Flink Gelly HBase MongoDB HIVE COLLECT PROCESS STORE VISUALIZE Cluster ecosystem - PROCESS In Memory Spark TEZ Cloudera, Hortonworks, MapR Amazon DynamoDB Amazon EC2 Amazon EMR Amazon S3 Amazon Glacier
  • 11. © 2016 Ness SES. All Rights Reserved11 Warehouse DB Presto (ANSI) HP Vertica Search Engines SolrCloud Elastic Search Columnar Store Cassandra Accumulo Machine Learning Spark ML FlinkML, Mahout Key - Value Store Redis, Riak, Memcached COLLECT PROCESS STORE VISUALIZE Cluster ecosystem - STORE Amazon Redshift Amazon DynamoDB Amazon ElasticCache Amazon ElasticSearch Amazon ML
  • 12. © 2016 Ness SES. All Rights Reserved12 Tableau COLLECT PROCESS STORE VISUALIZE Cluster ecosystem - components Logi Jasper Reports D3 Pentaho* Crystal Reports*
  • 13. © 2016 Ness SES. All Rights Reserved13 HADOOP (HDFS) Res. Manager Mesos Yarn Warehouse DB Presto (ANSI) HP Vertica MapReduce PIG Search Engines SolrCloud Elastic Search Data Integration Talend Informatica Analytics Columnar Store Cassandra Accumulo Impala(Drill) GRAPHs Spark GraphX, Titan, Neo4J Flink Gelly Machine Learning Spark ML FlinkML, Mahout HBase MongoDB Data Streaming Storm, MapR Streams Spark Streaming Flink Stream HIVE Tableau Key - Value Store Redis, Riak, Memcached Data Aggregation Flume, Scribe Msg Brokers + Streams RabbitMQ ActiveMQ Kafka COLLECT PROCESS STORE VISUALIZE Data Loader Sqoop Cluster ecosystem - VISUALIZE In Memory Spark TEZ Cloudera, Hortonworks, MapR Logi Jasper Reports D3 Pentaho* Interactiv e Reporting Crystal Reports Data Governance Atlas
  • 14. © 2016 Ness SES. All Rights Reserved14 Trends - Forbes report Q1 2016 http://www.forbes.com/sites/gilpress/2016/03/14/top-10-hot-big-data-technologies/#7cd07887f26a
  • 15. © 2016 Ness SES. All Rights Reserved15 Thank you! Skype: r.moldovan