SlideShare a Scribd company logo
Big	Data	for	Managers:		
From	Hadoop	to	Streaming	and	Beyond	
Dr.	Vladimir	Bacvanski	
vladimir.bacvanski@scispike.com	
									@OnSo5ware
www.scispike.com																								Copyright	©	SciSpike	2016	
Dr.	Vladimir	Bacvanski	
§  Founder of SciSpike, a development,
consulting, and training firm
§  Passionate about software and data
§  PhD in computer science RWTH Aachen,
Germany
§  Architect, consultant, mentor
§  Custom development: Scalable Web
and IoT systems
§  Training and mentoring in
Big Data, Scala, node.js, software
architecture
@OnSoftware
https://www.linkedin.com/in/vladimirbacvanski
www.scispike.com																								Copyright	©	SciSpike	2016	
Problems	with	Rela9onal	Stores	
§  Data	that	does	not	naturally	fit	into	tables		
à	Impedance	mismatch	
§  Development	Eme	o5en	to	long	
§  Dealing	with	unstructured	data	
§  Performance	problems	
§  Difficult	to	run	on	clusters	
§  Cost	
3
www.scispike.com																								Copyright	©	SciSpike	2016	
Structured	and	Unstructured	Data	Sources	
Structured	Data	Sources	
• ExisEng	databases	
• ERP/CRM/BI	systems	
• Inventory	
• Supply	chain	
Unstructured	Data	
Sources	
• Server	logs	
• Search	engine	logs	
• Browsing	logs	
• E-Commerce	records	
• Social	media	
• Voice	
• Video	
• Sensor	data	
4
www.scispike.com																								Copyright	©	SciSpike	2016	
NoSQL	Impact	
5	
Disks	
Processors	
x1000	 x1000	 x1000	
Cost	/	Performance	
1M	 1B	 1T	 1Q	 …HUGE!!!	x1000	
Rela9onal	
Database	
Big	Data	+	NoSQL	
Tomorrow	-	Volume	
is	out	of	reach	
Today	-	Doable,	but	
expensive	and	slow	
Stabilize	Cost	&	
Increase	Performance	
Enable	Unlimited	
Volume	Growth
www.scispike.com																								Copyright	©	SciSpike	2016	
Scale	Up	vs.	Scale	Out	
6	
Capability	
Cost	
Scale	Up	
Capability	
Cost	 Scale	Out
www.scispike.com																								Copyright	©	SciSpike	2016	
A	Common	PaNern	for	Processing	Large	Data	
Load	a	large	set	of	records	onto	a	set	of	
machines	
Extract	something	interesEng	from	
each	record	
Shuffle	and	sort	intermediate	results	
Aggregate	intermediate	results	
Store	end	result	
7	
"Map"	
"Reduce"	
Key/Value	
pairs
www.scispike.com																								Copyright	©	SciSpike	2016	
Two	Key	Aspects	of	Hadoop	
§  MapReduce	framework		
– How	Hadoop	understands	and	assigns	work	to	the	nodes	
(machines)	
§  Hadoop	Distributed	File	System	=	HDFS	
– Where	Hadoop	stores	data	
– A	file	system	that	spans	all	the	nodes	in	a	Hadoop	cluster	
– It	links	together	the	file	systems	on	many	local	nodes	to	
make	them	into	one	big	file	system	
8
www.scispike.com																								Copyright	©	SciSpike	2016	
MapReduce	Example:	Word	Count	
§  WordCount	is	the	"Hello	World"	of	Big	Data	
– You	will	see	various	technologies	implemenEng	it	
– A	good	first	step	to	compare	the	expressiveness	of	Big	Data	
tools	
9	
dog cat bird
dog cat bird
dog dog cat
dog, 1
cat, 1
bird, 1
dog, 1
cat, 1
bird, 1
dog, 1
dog, 1
cat, 1
Map
dog, 1
dog, 1
dog, 1
dog, 1
cat, 1
cat, 1
cat, 1
bird, 1
bird, 1
Shuffle
dog, 4
cat, 3
bird, 2
Reduce
dog cat bird
dog cat bird
dog dog cat
pets.txt
dog, 4
cat, 3
bird, 2
pet_freq.txt
www.scispike.com																								Copyright	©	SciSpike	2016	10
The	MapReduce	Programming	Model	
§  "Map"	step:		
–  Input	split	into	pieces		
–  Worker	nodes	process	individual	pieces	in	parallel	(under	
global	control	of	the	Job	Tracker	node)		
–  Each	worker	node	stores	its	result	in	its	local	file	system	
where	a	reducer	is	able	to	access	it			
	
§  "Reduce"	step:	
–  Data	is	aggregated	(‘reduced”	from	the	map	steps)	by	
worker	nodes	(under	control	of	the	Job	Tracker)		
–  MulEple	reduce	tasks	can	parallelize	the	aggregaEon		
10
www.scispike.com																								Copyright	©	SciSpike	2016	
Separa9on	of	Work	
Programmers	
• Map		
• Reduce	
Framework	
• Deals	with	fault	
tolerance	
• Assign	workers	to	map	
and	reduce	tasks	
• Moves	processes	to	
data	
• Shuffles	and	sorts	
intermediate	data	
• Deals	with	errors	
11
www.scispike.com																								Copyright	©	SciSpike	2016	
How	To	Create	MapReduce	Jobs	
§  Java	API	
– Low	level,	very	flexible	
– Time	consuming	development	
§  Streaming	API	
– A	simple,	producEve	model	for	Python	and	Ruby	
§  Hive	
– Open	source	language	/	Apache	sub-project	
– Provides	a	SQL-like	interface	to	Hadoop	
§  Pig	
– Data	flow	language	/	Apache	sub-project	
	
15
www.scispike.com																								Copyright	©	SciSpike	2016	
The	Big	Picture:	NoSQL	+	Hadoop	in	Applica9ons	
16	
Columnar	
Price	
updates	
Logs	
Document	
Product	
info	
Graph	
Customer	
Agent	
relaFon-
ships	
RDB	
XA	data	
Hadoop	
Oper.	
analyFcs	
Price	
analyFcs	
Key/Value	
Session	
data	
ApplicaFons
www.scispike.com																								Copyright	©	SciSpike	2016	
Streaming:	A	New	Paradigm	
§  ConvenEonal		processing:	sta9c	data	
DataQueries Results
§  Real-time processing: streaming data
QueriesData Results
17
www.scispike.com																								Copyright	©	SciSpike	2016	
Common	Streaming	Applica9ons	
§  PersonalizaEon	
§  Search	
§  Revenue	opEmizaEon	
§  User	events	
§  Content	feeds	
§  Log	processing	
§  Monitoring	
§  RecommendaEons		
§  Ads	
§  Notable	users:	
–  Twiper	
–  Yahoo	
–  SpoEfy	
–  Cisco	
–  Flickr	
–  Weather	Channel	
18
www.scispike.com																								Copyright	©	SciSpike	2016	
Beyond	Hadoop:	Spark	&	Flink	
19	
MapReduce Tez
Spark
Flink
www.scispike.com																								Copyright	©	SciSpike	2016	
Apache	Spark	
§  Important	Features	
– In	Memory	Data	
– Resilient	Distributed	Datasets	(RDDs)	
•  Datasets	can	rebuild	themselves	if	failure	occurs	
– Rich	set	of	operators	
§  Efficient:		
– 10x	(on	Disk)	-100x		(In	Memory)	faster	than	Hadoop	MR	
– 2	to	5	Emes	less	code	(Rich	APIs	in	Scala/Java/Python)		
20
www.scispike.com																								Copyright	©	SciSpike	2016	
Spark	Architecture	
§  A	powerful	set	of	tools	
§  Beyond	tradiEonal	Hadoop	
Source:	hpp://spark.apache.org
www.scispike.com																								Copyright	©	SciSpike	2016	
Data	Sharing	in	Apache	Spark	
H
D
F
S	
IteraFon	1	
Result	1	
Held	In	
Cluster	
Memory	
IteraFon	2	
Result	2	
Held	In	
Cluster	
Memory	
Query	1	
Query	2
www.scispike.com																								Copyright	©	SciSpike	2016	
Apache	Flink	
§  ExecuEon:	
–  Programs	compiled	into	an	execuEon	plan	
–  Plan	is	opEmized	
–  Executed	
§  Design	goals:	
–  High	performance	
–  Hybrid	batch	and	streaming	runEme	
–  Simplicity	for	the	developer	
–  Rich	libraries	
–  IntegraEon	with	many	systems	
23
www.scispike.com																								Copyright	©	SciSpike	2016	
Apache	Flink	Components	
§  IntegraEon	with	Hadoop	YARN,	MapReduce,	HBase,	
Cassandra,	Kara,	…	
§  ExecuEon	engine	for	Apache	Beam	(Google	Dataflow)	
24
www.scispike.com																								Copyright	©	SciSpike	2016	
Flink	Op9miza9on	and	Execu9on	
§  OpEmizer	selects	an	execuEon	plan	
§  Similar	to	what	we	have	in	relaEonal	databases	
§  OpEmal	plan	depends	on	the	size	of	the	input	files	
§  Run	as	standalone	or	on	top	of	Hadoop	
§  IntegraEon	with	many	Hadoop	technologies	
25
www.scispike.com																								Copyright	©	SciSpike	2016	
Flink	&	Spark:	The	Advantages	and	Outlook	
§  Less	IO	overhead	than	convenEonal	Hadoop	
§  Caching	
§  IteraEve	algorithms	
§  Unifying	batch	and	stream	compuEng	
§  Scala	as	a	natural,	expressive	language	for	Big	Data	
– Other	languages:	Python,	Java,	R	
§  Beware	of	less	mature	components	
26
www.scispike.com																								Copyright	©	SciSpike	2016	
Typical	NoSQL	Systems	
§  Non-relaKonal	
§  Distributed	
§  Horizontally	scalable	
§  No	need	for	a	fixed	schema	
§  Several	established		
players	
§  Systems	are		
specialized	
27
www.scispike.com																								Copyright	©	SciSpike	2016	
NoSQL	Stores	and	Their	Categories	
§  Choose	a	store	that	is	a	best	match	for	your	applicaEon	
§  It	is	fine	to	have	several	different	stores	used	
– "Polyglot	persistence"	
28	
k	 v	
Key-Value	Column-
Family	
Document-
Oriented	
Graph	DB
www.scispike.com																								Copyright	©	SciSpike	2016	
NoSQL	Stores:	Scale	vs.	Complexity	of	Data	
29	
k	 v	
Key-Value	
Column-
Family	
Document-
Oriented	
complexity	
scalability	
Graph	DB	
needs	of	most	applicaFons
www.scispike.com																								Copyright	©	SciSpike	2016	
Key-Value	Stores	
§  Key	à	Value	mapping	
§  Large,	persistent	Map	("hashtable")	
– Values	could	be	lists	and	hashes	
§  Easy	to	use	
§  Scale	very	well	
§  Data	model	may	be	too	simple	for	most	applicaEons	
§  Systems:	
– Redis,	Riak,	Memcached,	Amazon	DynamoDB,	Aerospike,	
FoundaEonDB	
§  Use	when	data	model	is	very	simple	and	scalability	essenEal	
	
30
www.scispike.com																								Copyright	©	SciSpike	2016	
Typical	Use	Cases	
§  The	data	model	is	very	simple!	
– Actual	data	can	be	JSON	
§  Session	data	
§  User	preferences	and	profiles	
§  Shopping	cart	
§  If	other	NoSQL	store	is	good	enough,	you	may	want	to	skip	
this	and	let	Column	or	Document	store	handle	it	
31
www.scispike.com																								Copyright	©	SciSpike	2016	
Column-Family	
§  "Column-family":	similar	to	a	table	
– Table	is	sparse	
§  Key	à	(Column:Value)*	
§  Columns	have	names	
§  Can	be	indexed	
§  Can	store	complex	data	
– Denormalize!	
§  Systems:	
– Google	BigTable,	HBase,	Cassandra,		Amazon	SimpleDB,	
Hypertable	
§  Use	when	scalability	is	essenEal	
32
www.scispike.com																								Copyright	©	SciSpike	2016	
Typical	Use	Cases	
§  High	insert	volume:	logging	
§  Real-Eme	updates	
§  Content	management	
§  Expiring	content	
§  Cross-datacenter	replicaEon	
§  MapReduce	analyEcs	over	stored	data	
§  You	don’t	need	convenEonal	(ACID)	transacEons	
33
www.scispike.com																								Copyright	©	SciSpike	2016	
Document	Stores	
§  JSON,	BSON,	XML	
§  No	schema	
§  Indexes	improve	performance	
§  Easy	transiEon	from	RDBMS	
§  Systems	
– MongoDB,	CouchDB,	CouchBase	
§  Use	when	data	is	in	semi-structured	form	
§  O5en	seen	in	new	Web	applicaEons	
34
www.scispike.com																								Copyright	©	SciSpike	2016	
Typical	Use	Cases	
§  Logging		
– Especially	with	variable	content	
§  Product	informaEon	
§  Customer	informaEon	
§  Content	management	
§  Data	to	be	stored	has	format	that	varies	over	Eme	
– Flexible	schema	
§  Web	analyEcs	
35
www.scispike.com																								Copyright	©	SciSpike	2016	
Graph	Databases	
§  Nodes	with	properEes	
§  Nodes	connected	through	relaEonships	
§  Can	model	very	complex	graph	data	
– Social	networks	
§  Systems:	
– Neo4J,	Infinite	Graph,	TitanDB,	OrientDB	
§  Use	when	data	is	a	(complex)	graph	
36
www.scispike.com																								Copyright	©	SciSpike	2016	
Typical	Use	Cases	
§  Highly	interconnected	data	
§  Social	graphs	
§  Party	relaEonships	in	an	enterprise	
§  LocaEon	based	services	
§  Purchasing	analyEcs	and	recommendaEons	
§  O5en	combined	with	other	systems	to	store	the	bulk	of	data	
– Graph	database	can	focus	on	relaEonships	
37
www.scispike.com																								Copyright	©	SciSpike	2016	
Integra9ng	Rela9onal,	Streams,	and	Hadoop	
Streams	
Data	+		
Big	Data	
TradiEonal	
Warehouse	
In-MoEon	
AnalyEcs	
Data	analyEcs	 Results	
Database	&	
Warehouse	
At-rest	data	
analyEcs	
Results	
Ultra	Low	Latency	
Results	
TradiEonal	/	
RelaEonal																		
Data	Sources	
Non-TradiEonal	/	
Non-RelaEonal								
Data	Sources	
Varied	data	
formats		
Semi-structured,	
unstructured...	
Event	
System	
NoSQL	
38
www.scispike.com																								Copyright	©	SciSpike	2016	
Merge	
Results	
Lambda	Architecture	
39	
Event	(Speed)	Layer	
Real	Time	
Data	
Batch	Layer	 Serving	Layer	
Master	
Dataset	
Batch		
View	
Incoming	
Data	
Real	Time	Update	
Batch	Update	
Queries	
Rolling	Values
www.scispike.com																								Copyright	©	SciSpike	2016	
Master	Data	Management	and	Governance	
§  Big	Data	and	NoSQL	stores	can	easily	become	a	bigger	mess	
than	relaEonal	stores	
§  Introduce	a	pracEcal	plan	
– Avoid	lengthy	and	cumbersome	governance	
– Actual	use	should	be	the	driving	force	
– Start	slow	
§  Be	ready	for	change	
– The	technologies	change	rapidly	
§  Focus	on	business	outcomes	
40
www.scispike.com																								Copyright	©	SciSpike	2016	
Succeeding	with	Big	Data	and	NoSQL	
1.  AcEvely	look	for	soluEons	where		the	right	store	can	ease	the	
pain	
2.  Make	sure	you	deliver	tangible	value	to	clients	
3.  A5er	you	get	your	first	apps	to	work:	create	a	Big	Data	
introducEon	and	governance	plan		
4.  PrioriEze:	do	the	most	useful	thing	for	the	business	first	
5.  Integrate	with	exisEng	IT	
6.  Make	sure	you	hire	or	grow	your	Big	Data	champions	
7.  Field	is	immature:	look	out	for	new	tools	and	techniques	
41
www.scispike.com																								Copyright	©	SciSpike	2016	
Conclusions		
– Hadoop	and	NoSQL	address	the	weak	points	of	relaEonal	
systems:	
•  Scale	
•  Performance	
•  Unstructured	and	semistructured	data	
– Streaming	addresses	the	processing	of	data	in	real-Eme	
– Integrate	with	convenEonal	technologies!	
– Spark	and	Flink:	the	next	generaEon	Big	Data	systems	
42
QuesKons?

More Related Content

What's hot (18)

PDF
DAMA Chicago - Ensuring your data lake doesn’t become a data swamp
NVISIA
 
PPTX
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
DataWorks Summit
 
PPTX
Swimming Across the Data Lake, Lessons learned and keys to success
DataWorks Summit/Hadoop Summit
 
PDF
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
DataWorks Summit/Hadoop Summit
 
PPTX
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
DataWorks Summit
 
PPTX
Navigating the World of User Data Management and Data Discovery
DataWorks Summit/Hadoop Summit
 
PPTX
Deploying a Governed Data Lake
WaterlineData
 
PDF
Filling the Data Lake
DataWorks Summit/Hadoop Summit
 
PDF
High Performance Spatial-Temporal Trajectory Analysis with Spark
DataWorks Summit/Hadoop Summit
 
PPTX
Spark and Hadoop Perfect Togeher by Arun Murthy
Spark Summit
 
PDF
Strata San Jose 2017 - Ben Sharma Presentation
Zaloni
 
PPTX
Hadoop Journey at Walgreens
DataWorks Summit
 
PPTX
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
DataWorks Summit
 
PDF
Solving Big Data Problems using Hortonworks
DataWorks Summit/Hadoop Summit
 
PPTX
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
DataWorks Summit/Hadoop Summit
 
PDF
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
NoSQLmatters
 
PDF
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
Big Data Spain
 
DAMA Chicago - Ensuring your data lake doesn’t become a data swamp
NVISIA
 
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
DataWorks Summit
 
Swimming Across the Data Lake, Lessons learned and keys to success
DataWorks Summit/Hadoop Summit
 
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
DataWorks Summit/Hadoop Summit
 
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
DataWorks Summit
 
Navigating the World of User Data Management and Data Discovery
DataWorks Summit/Hadoop Summit
 
Deploying a Governed Data Lake
WaterlineData
 
Filling the Data Lake
DataWorks Summit/Hadoop Summit
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
DataWorks Summit/Hadoop Summit
 
Spark and Hadoop Perfect Togeher by Arun Murthy
Spark Summit
 
Strata San Jose 2017 - Ben Sharma Presentation
Zaloni
 
Hadoop Journey at Walgreens
DataWorks Summit
 
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
DataWorks Summit
 
Solving Big Data Problems using Hortonworks
DataWorks Summit/Hadoop Summit
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
DataWorks Summit/Hadoop Summit
 
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
NoSQLmatters
 
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
Big Data Spain
 

Viewers also liked (20)

PPTX
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
DataWorks Summit/Hadoop Summit
 
PPTX
Producing Spark on YARN for ETL
DataWorks Summit/Hadoop Summit
 
PDF
Machine Learning for Any Size of Data, Any Type of Data
DataWorks Summit/Hadoop Summit
 
PPTX
A New "Sparkitecture" for modernizing your data warehouse
DataWorks Summit/Hadoop Summit
 
PDF
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Daniel Madrigal
 
PPTX
The Future of Apache Hadoop an Enterprise Architecture View
DataWorks Summit/Hadoop Summit
 
PPTX
Accelerating Data Warehouse Modernization
DataWorks Summit/Hadoop Summit
 
PPTX
Open Source Ingredients for Interactive Data Analysis in Spark
DataWorks Summit/Hadoop Summit
 
PPTX
Apache Ranger Hive Metastore Security
DataWorks Summit/Hadoop Summit
 
PPT
Toward Better Multi-Tenancy Support from HDFS
DataWorks Summit/Hadoop Summit
 
PPTX
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
DataWorks Summit/Hadoop Summit
 
PPTX
Apache Hive ACID Project
DataWorks Summit/Hadoop Summit
 
PPTX
Evolving HDFS to a Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit
 
PPTX
From Zero to Data Flow in Hours with Apache NiFi
DataWorks Summit/Hadoop Summit
 
PDF
Big Data Security and Governance
DataWorks Summit/Hadoop Summit
 
PPTX
File Format Benchmark - Avro, JSON, ORC & Parquet
DataWorks Summit/Hadoop Summit
 
PPTX
Big Data Analytics with Hadoop
Philippe Julio
 
PDF
A distributed video management cloud platform using hadoop
redpel dot com
 
PPTX
Unlocking Operational Intelligence from the Data Lake
MongoDB
 
PPTX
Enterprise Grade Streaming under 2ms on Hadoop
DataWorks Summit/Hadoop Summit
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
DataWorks Summit/Hadoop Summit
 
Producing Spark on YARN for ETL
DataWorks Summit/Hadoop Summit
 
Machine Learning for Any Size of Data, Any Type of Data
DataWorks Summit/Hadoop Summit
 
A New "Sparkitecture" for modernizing your data warehouse
DataWorks Summit/Hadoop Summit
 
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Daniel Madrigal
 
The Future of Apache Hadoop an Enterprise Architecture View
DataWorks Summit/Hadoop Summit
 
Accelerating Data Warehouse Modernization
DataWorks Summit/Hadoop Summit
 
Open Source Ingredients for Interactive Data Analysis in Spark
DataWorks Summit/Hadoop Summit
 
Apache Ranger Hive Metastore Security
DataWorks Summit/Hadoop Summit
 
Toward Better Multi-Tenancy Support from HDFS
DataWorks Summit/Hadoop Summit
 
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
DataWorks Summit/Hadoop Summit
 
Apache Hive ACID Project
DataWorks Summit/Hadoop Summit
 
Evolving HDFS to a Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit
 
From Zero to Data Flow in Hours with Apache NiFi
DataWorks Summit/Hadoop Summit
 
Big Data Security and Governance
DataWorks Summit/Hadoop Summit
 
File Format Benchmark - Avro, JSON, ORC & Parquet
DataWorks Summit/Hadoop Summit
 
Big Data Analytics with Hadoop
Philippe Julio
 
A distributed video management cloud platform using hadoop
redpel dot com
 
Unlocking Operational Intelligence from the Data Lake
MongoDB
 
Enterprise Grade Streaming under 2ms on Hadoop
DataWorks Summit/Hadoop Summit
 
Ad

Similar to Big Data for Managers: From hadoop to streaming and beyond (20)

PDF
Introduction to Big data & Hadoop -I
Edureka!
 
PPTX
Introduction to Big Data and Hadoop
Edureka!
 
PPTX
Big Data
Faisal Ahmed
 
PDF
Understanding Big Data And Hadoop
Edureka!
 
PPT
Data analytics & its Trends
Dr.K.Sreenivas Rao
 
PDF
Rajesh Angadi Brochure
Rajesh Angadi
 
PPSX
De-Mystifying Big Data
Prasad Mavuduri
 
PDF
Hadoop Master Class : A concise overview
Abhishek Roy
 
PPT
Lecture 5 - Big Data and Hadoop Intro.ppt
almaraniabwmalk
 
PDF
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Demi Ben-Ari
 
PPTX
Big data analytics - Introduction to Big Data and Hadoop
SamiraChandan
 
PDF
Big dataimplementation hadoop_and_beyond
Patrick Bouillaud
 
PDF
Google Developer Group Lublin 8 - Modern Lambda architecture in Big Data
Hejwowski Piotr
 
PPTX
Understanding big data
Praneet Samaiya
 
PPTX
Data mining with big data
Sandip Tipayle Patil
 
PDF
BigData Behind-the-Scenes~20150827
Anthony Potappel
 
PPTX
Fundamentals of big data analytics and Hadoop
Archana Gopinath
 
PPTX
"Demystifying Big Data by AIBDP.org
AIBDP
 
PDF
Introduction to Big Data & Hadoop
Edureka!
 
PPTX
Chapter1-Introduction Εισαγωγικές έννοιες
ssuserb91a20
 
Introduction to Big data & Hadoop -I
Edureka!
 
Introduction to Big Data and Hadoop
Edureka!
 
Big Data
Faisal Ahmed
 
Understanding Big Data And Hadoop
Edureka!
 
Data analytics & its Trends
Dr.K.Sreenivas Rao
 
Rajesh Angadi Brochure
Rajesh Angadi
 
De-Mystifying Big Data
Prasad Mavuduri
 
Hadoop Master Class : A concise overview
Abhishek Roy
 
Lecture 5 - Big Data and Hadoop Intro.ppt
almaraniabwmalk
 
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Demi Ben-Ari
 
Big data analytics - Introduction to Big Data and Hadoop
SamiraChandan
 
Big dataimplementation hadoop_and_beyond
Patrick Bouillaud
 
Google Developer Group Lublin 8 - Modern Lambda architecture in Big Data
Hejwowski Piotr
 
Understanding big data
Praneet Samaiya
 
Data mining with big data
Sandip Tipayle Patil
 
BigData Behind-the-Scenes~20150827
Anthony Potappel
 
Fundamentals of big data analytics and Hadoop
Archana Gopinath
 
"Demystifying Big Data by AIBDP.org
AIBDP
 
Introduction to Big Data & Hadoop
Edureka!
 
Chapter1-Introduction Εισαγωγικές έννοιες
ssuserb91a20
 
Ad

More from DataWorks Summit/Hadoop Summit (20)

PPT
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
PPT
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
PDF
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
PDF
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
PDF
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
 
PDF
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
PDF
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
PDF
Data Science Crash Course
DataWorks Summit/Hadoop Summit
 
PDF
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
PDF
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
PPTX
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
 
PPTX
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
PDF
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
PPTX
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
PPTX
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
PPTX
HBase in Practice
DataWorks Summit/Hadoop Summit
 
PPTX
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
PDF
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
PPTX
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
PPTX
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
 
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
Data Science Crash Course
DataWorks Summit/Hadoop Summit
 
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
HBase in Practice
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
 

Recently uploaded (20)

PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Biography of Daniel Podor.pdf
Daniel Podor
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Biography of Daniel Podor.pdf
Daniel Podor
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 

Big Data for Managers: From hadoop to streaming and beyond