Hadoop Ecosystem Architecture Overview

Hadoop Technologies
Architecture Overview

@senthil245

Mail - senthil245@gmail.com

DISTRIBUTED CLUSTER ARCHITECTURE: MASTER/SLAVE

WHEN MAPREDUCE
Since the MapReduce is running within a
cluster of computing nodes, the architecture is
very scalable.
• In other words, if the data size is increased by
the factor of x, the performance should be still
constant if we are adding a predictable/fixed
factor of y.

The graph on the right is illustrating the
relationship between the size of the data (xaxis) and processing time (y-axis).
•The blue color curve is the process using
traditional programming. On the other hand, the
black color curve is the process using Hadoop.
When the data size is small, traditional
programming is better performance because the
bootstrap of Hadoop is expensive (Copy the data
within the cluster, inter-nodes communication,
etc.).

Once the data size is big enough, the penalty
of the Hadoop bootstrap becomes invisible.
•Hence Hadoop is best suited for Big Data
crunching ideally in terms of petaBytes and is
not suited for implementing common data
integration patterns

APACHE OOZIE – WORKFLOW SCHEDULER (CHECK AZKABAN & LINKEDIN OPENSOURCE)

PIG AND HQL (DO

NOT USE

HQL)

APACHE S4 (STREAM PROCESSING)(ALSO CHECK KAFKA

AND

STORM)

APACHE ZOOKEEPER SERVICE (ALSO CHECK APACHE HUE)

APACHE HCATALOG, HIVE

AND

HBASE

Hadoop Ecosystem Architecture Overview

More Related Content

What's hot (20)

Viewers also liked (6)

Similar to Hadoop Ecosystem Architecture Overview (20)

Recently uploaded (20)

Hadoop Ecosystem Architecture Overview