SlideShare a Scribd company logo
Big-Data Processing utilizing 
Open-Source Technology Stack 
By 
Amir Sedighi 
http://www.linkedin.com/in/amirsedighi 
@amirsedighi 
Linux and Ubuntu 14.10 Release Conf 1
References 
● http://www.slideshare.net/BernardMarr/140228-big-data-slide-share?qid=017848e 
2-9e2a-4dc3-963c-52b6a90fba2a&v=default&b=&from_search=1 
● http://www.forbes.com/fdc/welcome_mjx.shtml 
● ZYMR Spark Your Real-Time Big Data Analytics 
Linux and Ubuntu 14.10 Release Conf 2 
● http://dataconomy.com 
● https://datakulfi.wordpress.com/2013/03/27/big-data-open-source-technology-landsca 
pe/ 
● http://www.slideshare.net/andrefaria/big-data-abc?qid=1ac97e4a-4acc-460a-b3f8 
-9122f7210440&v=qf1&b=&from_search=12 
● https://wiki.apache.org/hadoop/PoweredBy
Data Explosion 
Linux and Ubuntu 14.10 Release Conf 3
Data Explosion 
Linux and Ubuntu 14.10 Release Conf 4
● Big-Data is that everything we do is increasingly 
leaving a digital trace which we (or others) can 
gather, use and analyze. 
– Data Providers 
● Business Companies 
● People 
Linux and Ubuntu 14.10 Release Conf 5
Volume, Velocity, Variety 
● “There was 5 exabytes of 
information created between 
the dawn of civilization 
through 2003, but that much 
information is now created 
every 2 days, and the pace is 
increasing.” Eric Schmidt 
Linux and Ubuntu 14.10 Release Conf 6
Big-Data Processing 
Linux and Ubuntu 14.10 Release Conf 7
How to provide a 
Big-Data processing platform 
using commodity machines? 
Linux and Ubuntu 14.10 Release Conf 8
Vertical or Horizontal? 
Linux and Ubuntu 14.10 Release Conf 9
Scale Up vs Scale Out 
Linux and Ubuntu 14.10 Release Conf 10
Scale Up vs Scale Out 
Linux and Ubuntu 14.10 Release Conf 11
Big-Data Processing 
Open-Source Technology Stack 
Linux and Ubuntu 14.10 Release Conf 12
Map-Reduce 
Linux and Ubuntu 14.10 Release Conf 13
Hadoop Framework 
Linux and Ubuntu 14.10 Release Conf 14
Apache Hadoop Main Projects 
Linux and Ubuntu 14.10 Release Conf 15
Linux and Ubuntu 14.10 Release Conf 16
Data Stores 
Linux and Ubuntu 14.10 Release Conf 17 
● Data Stores 
– KeyValue 
– Graph 
– Columnar 
– Document Store 
– In Memory
Data Transfer 
Linux and Ubuntu 14.10 Release Conf 18 
● Apache Flume 
● Apache Sqoop
Search 
Linux and Ubuntu 14.10 Release Conf 19 
● Elasticsearch 
● Apache SolR
Messaging and Queuing 
Linux and Ubuntu 14.10 Release Conf 20 
● Apache Kafka 
● ZeroMQ
Log Management 
Linux and Ubuntu 14.10 Release Conf 21 
● ELK 
● Logstash 
● FluentD
Stream Processing 
Linux and Ubuntu 14.10 Release Conf 22 
● Apache Storm 
● Apache Samza 
● Apache Spark
Machine Learning 
● Apache Mahout 
Linux and Ubuntu 14.10 Release Conf 23 
● MLLib 
● GraphX
Questions? 
Linux and Ubuntu 14.10 Release Conf 24

More Related Content

What's hot (20)

PPTX
The of Operational Analytics Data Store
Rommel Garcia
 
PDF
How to teach your data scientist to leverage an analytics cluster with Presto...
Alluxio, Inc.
 
PPTX
Lessons Learned - Monitoring the Data Pipeline at Hulu
DataWorks Summit
 
PDF
Elastic Data Analytics Platform @Datadog
C4Media
 
PPTX
The evolution of the big data platform @ Netflix (OSCON 2015)
Eva Tse
 
PDF
Lambda architecture @ Indix
Rajesh Muppalla
 
PDF
Druid in Spot Instances
Imply
 
PDF
Lambda Architectures in Practice
C4Media
 
PDF
Scylla Summit 2022: An Odyssey to ScyllaDB and Apache Kafka
ScyllaDB
 
PDF
Capital One: Using Cassandra In Building A Reporting Platform
DataStax Academy
 
PDF
Real-time analytics with Druid at Appsflyer
Michael Spector
 
PDF
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
Big Data Spain
 
PDF
Accelerate Analytics and ML in the Hybrid Cloud Era
Alluxio, Inc.
 
PPTX
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Tugdual Grall
 
PDF
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
Rommel Garcia
 
PDF
Natalie Godec - AirFlow and GCP: tomorrow's health service data platform
matteo mazzeri
 
PDF
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
DataStax
 
PPTX
Lambda architecture with Spark
Vincent GALOPIN
 
PDF
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
PDF
Consolidate Your Technical Debt With Spark Data Sources -Tools and Techniques...
Databricks
 
The of Operational Analytics Data Store
Rommel Garcia
 
How to teach your data scientist to leverage an analytics cluster with Presto...
Alluxio, Inc.
 
Lessons Learned - Monitoring the Data Pipeline at Hulu
DataWorks Summit
 
Elastic Data Analytics Platform @Datadog
C4Media
 
The evolution of the big data platform @ Netflix (OSCON 2015)
Eva Tse
 
Lambda architecture @ Indix
Rajesh Muppalla
 
Druid in Spot Instances
Imply
 
Lambda Architectures in Practice
C4Media
 
Scylla Summit 2022: An Odyssey to ScyllaDB and Apache Kafka
ScyllaDB
 
Capital One: Using Cassandra In Building A Reporting Platform
DataStax Academy
 
Real-time analytics with Druid at Appsflyer
Michael Spector
 
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
Big Data Spain
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Alluxio, Inc.
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Tugdual Grall
 
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
Rommel Garcia
 
Natalie Godec - AirFlow and GCP: tomorrow's health service data platform
matteo mazzeri
 
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
DataStax
 
Lambda architecture with Spark
Vincent GALOPIN
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
Consolidate Your Technical Debt With Spark Data Sources -Tools and Techniques...
Databricks
 

Similar to Opensource Frameworks and BigData Processing (20)

PDF
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 
PPTX
Scientific Computing @ Fred Hutch
Dirk Petersen
 
PDF
Sa introduction to big data pipelining with cassandra & spark west mins...
Simon Ambridge
 
PPTX
Open Source india 2014
lohitvijayarenu
 
PDF
Discover the Linux on z Systems Effect
IBM
 
PPTX
Architecting Your First Big Data Implementation
Adaryl "Bob" Wakefield, MBA
 
PPTX
Sanger, upcoming Openstack for Bio-informaticians
Peter Clapham
 
PPTX
Flexible compute
Peter Clapham
 
PDF
Linux concepts
NAILBITER
 
KEY
London devops logging
Tomas Doran
 
PDF
What's new with Apache Spark?
Paco Nathan
 
PDF
How Apache Spark fits in the Big Data landscape
Paco Nathan
 
PDF
Using Open Source technologies to create Enterprise Level Cloud System
OpenFest team
 
PDF
SUSE y Big Data
SUSE España
 
PDF
SUSE: Infraestructura definida por software para BigData
Juan Herrera Utande
 
PDF
Liferay & Big Data Dev Con 2014
Miguel Pastor
 
PDF
Big Data Processing Utilizing Open-source Technologies - May 2015
Amir Sedighi
 
PPT
Architecting Big Data Ingest & Manipulation
George Long
 
PDF
Lean Enterprise, Microservices and Big Data
Stylight
 
PDF
Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...
Spark Summit
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 
Scientific Computing @ Fred Hutch
Dirk Petersen
 
Sa introduction to big data pipelining with cassandra & spark west mins...
Simon Ambridge
 
Open Source india 2014
lohitvijayarenu
 
Discover the Linux on z Systems Effect
IBM
 
Architecting Your First Big Data Implementation
Adaryl "Bob" Wakefield, MBA
 
Sanger, upcoming Openstack for Bio-informaticians
Peter Clapham
 
Flexible compute
Peter Clapham
 
Linux concepts
NAILBITER
 
London devops logging
Tomas Doran
 
What's new with Apache Spark?
Paco Nathan
 
How Apache Spark fits in the Big Data landscape
Paco Nathan
 
Using Open Source technologies to create Enterprise Level Cloud System
OpenFest team
 
SUSE y Big Data
SUSE España
 
SUSE: Infraestructura definida por software para BigData
Juan Herrera Utande
 
Liferay & Big Data Dev Con 2014
Miguel Pastor
 
Big Data Processing Utilizing Open-source Technologies - May 2015
Amir Sedighi
 
Architecting Big Data Ingest & Manipulation
George Long
 
Lean Enterprise, Microservices and Big Data
Stylight
 
Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...
Spark Summit
 
Ad

More from Amir Sedighi (19)

PDF
Dark data
Amir Sedighi
 
PDF
آشنایی با داده‌های بزرگ و تکنیک‌های برنامه‌سازی برای پردازش داده‌های بزرگ
Amir Sedighi
 
PDF
Big Data and Machine Learning Workshop - Day 7 @ UTACM
Amir Sedighi
 
PDF
Big Data and Machine Learning Workshop - Day 6 @ UTACM
Amir Sedighi
 
PDF
Big Data and Machine Learning Workshop - Day 5 @ UTACM
Amir Sedighi
 
PDF
Big Data and Machine Learning Workshop - Day 4 @ UTACM
Amir Sedighi
 
PDF
Big Data and Machine Learning Workshop - Day 3 @ UTACM
Amir Sedighi
 
PDF
Big Data and Machine Learning Workshop - Day 2 @ UTACM
Amir Sedighi
 
PDF
Big Data and Machine Learning Workshop - Day 1 @ UTACM
Amir Sedighi
 
PDF
Two Case Studies Big-Data and Machine Learning at Scale Solutions in Iran
Amir Sedighi
 
PDF
Helio, a Continues Real-Time Fraud Detection and Monitoring Solution
Amir Sedighi
 
PDF
Case Studies on Big-Data Processing and Streaming - Iranian Java User Group
Amir Sedighi
 
PDF
Elasticsearch 1.x Cluster Installation (VirtualBox)
Amir Sedighi
 
PDF
Hadoop 2.x HDFS Cluster Installation (VirtualBox)
Amir Sedighi
 
PDF
An Introduction to Apache Kafka
Amir Sedighi
 
PDF
An introduction To Apache Spark
Amir Sedighi
 
PDF
Distributed Data Processing Workshop - SBU
Amir Sedighi
 
PDF
An introduction to Big-Data processing applying hadoop
Amir Sedighi
 
PDF
An Introduction to Elasticsearch for Beginners
Amir Sedighi
 
Dark data
Amir Sedighi
 
آشنایی با داده‌های بزرگ و تکنیک‌های برنامه‌سازی برای پردازش داده‌های بزرگ
Amir Sedighi
 
Big Data and Machine Learning Workshop - Day 7 @ UTACM
Amir Sedighi
 
Big Data and Machine Learning Workshop - Day 6 @ UTACM
Amir Sedighi
 
Big Data and Machine Learning Workshop - Day 5 @ UTACM
Amir Sedighi
 
Big Data and Machine Learning Workshop - Day 4 @ UTACM
Amir Sedighi
 
Big Data and Machine Learning Workshop - Day 3 @ UTACM
Amir Sedighi
 
Big Data and Machine Learning Workshop - Day 2 @ UTACM
Amir Sedighi
 
Big Data and Machine Learning Workshop - Day 1 @ UTACM
Amir Sedighi
 
Two Case Studies Big-Data and Machine Learning at Scale Solutions in Iran
Amir Sedighi
 
Helio, a Continues Real-Time Fraud Detection and Monitoring Solution
Amir Sedighi
 
Case Studies on Big-Data Processing and Streaming - Iranian Java User Group
Amir Sedighi
 
Elasticsearch 1.x Cluster Installation (VirtualBox)
Amir Sedighi
 
Hadoop 2.x HDFS Cluster Installation (VirtualBox)
Amir Sedighi
 
An Introduction to Apache Kafka
Amir Sedighi
 
An introduction To Apache Spark
Amir Sedighi
 
Distributed Data Processing Workshop - SBU
Amir Sedighi
 
An introduction to Big-Data processing applying hadoop
Amir Sedighi
 
An Introduction to Elasticsearch for Beginners
Amir Sedighi
 
Ad

Recently uploaded (20)

PPTX
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PPTX
BinarySearchTree in datastructures in detail
kichokuttu
 
PDF
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
PPTX
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
PPTX
thid ppt defines the ich guridlens and gives the information about the ICH gu...
shaistabegum14
 
PPTX
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PPTX
Powerful Uses of Data Analytics You Should Know
subhashenia
 
PPTX
01_Nico Vincent_Sailpeak.pptx_AI_Barometer_2025
FinTech Belgium
 
PDF
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
PPTX
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
PPTX
03_Ariane BERCKMOES_Ethias.pptx_AIBarometer_release_event
FinTech Belgium
 
PDF
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
PDF
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
PDF
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
PPTX
Feb 2021 Ransomware Recovery presentation.pptx
enginsayin1
 
PDF
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
PDF
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
PDF
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
 
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
BinarySearchTree in datastructures in detail
kichokuttu
 
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
thid ppt defines the ich guridlens and gives the information about the ICH gu...
shaistabegum14
 
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
Powerful Uses of Data Analytics You Should Know
subhashenia
 
01_Nico Vincent_Sailpeak.pptx_AI_Barometer_2025
FinTech Belgium
 
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
03_Ariane BERCKMOES_Ethias.pptx_AIBarometer_release_event
FinTech Belgium
 
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
Feb 2021 Ransomware Recovery presentation.pptx
enginsayin1
 
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
 

Opensource Frameworks and BigData Processing

  • 1. Big-Data Processing utilizing Open-Source Technology Stack By Amir Sedighi http://www.linkedin.com/in/amirsedighi @amirsedighi Linux and Ubuntu 14.10 Release Conf 1
  • 2. References ● http://www.slideshare.net/BernardMarr/140228-big-data-slide-share?qid=017848e 2-9e2a-4dc3-963c-52b6a90fba2a&v=default&b=&from_search=1 ● http://www.forbes.com/fdc/welcome_mjx.shtml ● ZYMR Spark Your Real-Time Big Data Analytics Linux and Ubuntu 14.10 Release Conf 2 ● http://dataconomy.com ● https://datakulfi.wordpress.com/2013/03/27/big-data-open-source-technology-landsca pe/ ● http://www.slideshare.net/andrefaria/big-data-abc?qid=1ac97e4a-4acc-460a-b3f8 -9122f7210440&v=qf1&b=&from_search=12 ● https://wiki.apache.org/hadoop/PoweredBy
  • 3. Data Explosion Linux and Ubuntu 14.10 Release Conf 3
  • 4. Data Explosion Linux and Ubuntu 14.10 Release Conf 4
  • 5. ● Big-Data is that everything we do is increasingly leaving a digital trace which we (or others) can gather, use and analyze. – Data Providers ● Business Companies ● People Linux and Ubuntu 14.10 Release Conf 5
  • 6. Volume, Velocity, Variety ● “There was 5 exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days, and the pace is increasing.” Eric Schmidt Linux and Ubuntu 14.10 Release Conf 6
  • 7. Big-Data Processing Linux and Ubuntu 14.10 Release Conf 7
  • 8. How to provide a Big-Data processing platform using commodity machines? Linux and Ubuntu 14.10 Release Conf 8
  • 9. Vertical or Horizontal? Linux and Ubuntu 14.10 Release Conf 9
  • 10. Scale Up vs Scale Out Linux and Ubuntu 14.10 Release Conf 10
  • 11. Scale Up vs Scale Out Linux and Ubuntu 14.10 Release Conf 11
  • 12. Big-Data Processing Open-Source Technology Stack Linux and Ubuntu 14.10 Release Conf 12
  • 13. Map-Reduce Linux and Ubuntu 14.10 Release Conf 13
  • 14. Hadoop Framework Linux and Ubuntu 14.10 Release Conf 14
  • 15. Apache Hadoop Main Projects Linux and Ubuntu 14.10 Release Conf 15
  • 16. Linux and Ubuntu 14.10 Release Conf 16
  • 17. Data Stores Linux and Ubuntu 14.10 Release Conf 17 ● Data Stores – KeyValue – Graph – Columnar – Document Store – In Memory
  • 18. Data Transfer Linux and Ubuntu 14.10 Release Conf 18 ● Apache Flume ● Apache Sqoop
  • 19. Search Linux and Ubuntu 14.10 Release Conf 19 ● Elasticsearch ● Apache SolR
  • 20. Messaging and Queuing Linux and Ubuntu 14.10 Release Conf 20 ● Apache Kafka ● ZeroMQ
  • 21. Log Management Linux and Ubuntu 14.10 Release Conf 21 ● ELK ● Logstash ● FluentD
  • 22. Stream Processing Linux and Ubuntu 14.10 Release Conf 22 ● Apache Storm ● Apache Samza ● Apache Spark
  • 23. Machine Learning ● Apache Mahout Linux and Ubuntu 14.10 Release Conf 23 ● MLLib ● GraphX
  • 24. Questions? Linux and Ubuntu 14.10 Release Conf 24