SlideShare a Scribd company logo
BIG DATA.7 modern trends that every IT pro should know about - part 5/7
cc: mkandlez - https://www.flickr.com/photos/25541021@N00
Presented by Ibrahim Muhammadi.
Founder - AppWorx.cc
With more and more digitalization, there is huge
amounts of structured, semistructured and
unstructured data that is being generated.
cc: phsymyst - https://www.flickr.com/photos/78624556@N08
In the early days of this explosive growth in digital
data, businesses used to discard additional data
because there was no feasible way to make any sense
out of itcc: Kentrosaurus - https://www.flickr.com/photos/86125591@N00
But this is changing rapidly with advancements in
infrastructure needed for data storage and processing
collectively known as BIG DATA
cc: Tom Raftery - https://www.flickr.com/photos/67945918@N00
3Vs of big data: extreme volume of data, wide
variety of data types and the velocity at which the
data must be processed
cc: dalbera - https://www.flickr.com/photos/72746018@N00
Such voluminous data can come from different
sources, such as business sales records, the collected
results of experiments, real-time sensors used in IOT
and morecc: bionicteaching - https://www.flickr.com/photos/29096601@N00
Adequate compute power is needed to achieve the desired
velocity. This can potentially demand hundreds or thousands
of servers that can distribute the work and operate
collaboratively
cc: midom - https://www.flickr.com/photos/81295370@N00
In this short presentation we will look at some of
the more popular tools that have made the Big
Data revolution possible.
cc: Glenn Zucman - https://www.flickr.com/photos/18182611@N00
Hadoop
Distributed data storage and processing on consumer
grade hardware makes big data feasible. One open
source project for this is Hadoop.
cc: NASA Goddard Photo and Video - https://www.flickr.com/photos/24662369@N07
Hadoop enables distributed processing of large data sets
across clusters of computers using simple programming
models. It is designed to scale up to thousands of machines.
cc: solofotones - https://www.flickr.com/photos/14754973@N08
Rather than rely on hardware to deliver high-
availability, the Hadoop library is designed to detect
and handle failures at the application layer, so
delivering a highly-available service.cc: neil cummings - https://www.flickr.com/photos/23874985@N07
The ELK stack
Another open source tool that is used for Big Data is
Elasticsearch which can do blazing fast searches on
semistructured or unstructured datasets.
cc: DocChewbacca - https://www.flickr.com/photos/49462908@N00
Elasticsearch is a part of the Elastic stack or the ELK
stack that also contains Logstash (a data collection and
log parsing tool) and Kibana (for analytics and
visualization)cc: PLeia2 - https://www.flickr.com/photos/64684255@N00
Apache Kafka
Data migration using ETL (Extract - Transform - Load) does
not work well with Big Data and hence the traditional ETL
architecture is now changing to real-time data streaming
cc: SidPix - https://www.flickr.com/photos/22357152@N02
Apache Kafka is a high-throughput distributed message
system that is being adopted by hundreds of
companies to manage their real-time data.
cc: r2hox - https://www.flickr.com/photos/72764087@N00
Kafka is a perfect tool for building data
pipelines: it is reliable, scalable, and
efficient.cc: ikarusmedia - https://www.flickr.com/photos/32650580@N06
R - the language and environment for
statistical computing
R is an integrated suite of software
facilities for data manipulation, calculation
and graphical display.cc: Crystal Writer - https://www.flickr.com/photos/17483452@N00
With over 2 million users worldwide R is rapidly
becoming the leading programming language in
statistics and data science.
cc: Marc_Smith - https://www.flickr.com/photos/49503165485@N01
It is a great tool for data analysis and
can be efficiently used on very large
data sets.cc: Régis Gaidot - https://www.flickr.com/photos/22019171@N00
Big Data is the next frontier for innovation,
competition and productivity - in all fields from
healthcare to retail, from manufacturing to personal
and location data.cc: danielfoster437 - https://www.flickr.com/photos/17423713@N03
In most industries, established competitors and new
entrants will leverage data-driven strategies to
innovate, compete, and capture value from deep real-
time informationcc: verbeeldingskr8 - https://www.flickr.com/photos/35429044@N04
We at appworx.cc offer data services that can help
retail and other clients achieve their big data goals
quickly.
https://www.appworx.cc/datacc: Jason Michael - https://www.flickr.com/photos/70194213@N00

More Related Content

PDF
PowerStream: Propelling Energy Innovation with Predictive Analytics
SingleStore
 
PPTX
Janus graph lookingbackwardreachingforward
Demai Ni
 
PDF
How to Create the Google for Earth Data (XLDB 2015, Stanford)
Rainer Sternfeld
 
PDF
The Impact of Always-on Connectivity for Geospatial Applications and Analysis
SingleStore
 
PDF
The Critical Role of IoT Data Integration to develop Big Data Applications (f...
Rainer Sternfeld
 
PDF
Airline Reservations and Routing: A Graph Use Case
Jason Plurad
 
PDF
Graph Computing with JanusGraph
Jason Plurad
 
PDF
Djangocon Europe 2017: Planet Friendly Django
Chris Adams
 
PowerStream: Propelling Energy Innovation with Predictive Analytics
SingleStore
 
Janus graph lookingbackwardreachingforward
Demai Ni
 
How to Create the Google for Earth Data (XLDB 2015, Stanford)
Rainer Sternfeld
 
The Impact of Always-on Connectivity for Geospatial Applications and Analysis
SingleStore
 
The Critical Role of IoT Data Integration to develop Big Data Applications (f...
Rainer Sternfeld
 
Airline Reservations and Routing: A Graph Use Case
Jason Plurad
 
Graph Computing with JanusGraph
Jason Plurad
 
Djangocon Europe 2017: Planet Friendly Django
Chris Adams
 

What's hot (20)

PDF
Exploring Graph Use Cases with JanusGraph
Jason Plurad
 
PPTX
Predicting Loan Delinquency at One Million Transactions per Second
Revolution Analytics
 
PPTX
Powers of Ten Redux
Jason Plurad
 
PDF
Graph Computing with Apache TinkerPop
Jason Plurad
 
PDF
JanusGraph: Looking Backward, Reaching Forward
Jason Plurad
 
PDF
Indexing the Real World Sensor Networks (at RE.WORK Internet of Things Summit...
Rainer Sternfeld
 
PDF
NetApp Flash Storage Facts
NetApp Insight
 
PPTX
Reproducible Data Science with R
Revolution Analytics
 
PDF
Building Robust Production Data Pipelines with Databricks Delta
Databricks
 
PDF
Zillow's favorite big data & machine learning tools
njstevens
 
PDF
Driving the On-Demand Economy with Predictive Analytics
SingleStore
 
PPTX
SnapLogic Live: IoT Integration
SnapLogic
 
PDF
NetApp By The Numbers
NetApp Insight
 
PDF
Scalable Machine Learning
Mikio L. Braun
 
PDF
Designing a Better Planet with Big Data and Sensor Networks (for Intelligent ...
Rainer Sternfeld
 
PDF
Building an IoT Kafka Pipeline in Under 5 Minutes
SingleStore
 
PDF
Introduction to the IBM Watson Data Platform
Margriet Groenendijk
 
ODP
The data behind the HuisKluis
Christophe Guéret
 
PDF
NetApp Cloud Storage Facts
NetApp Insight
 
PDF
Credit Fraud Prevention with Spark and Graph Analysis
Jen Aman
 
Exploring Graph Use Cases with JanusGraph
Jason Plurad
 
Predicting Loan Delinquency at One Million Transactions per Second
Revolution Analytics
 
Powers of Ten Redux
Jason Plurad
 
Graph Computing with Apache TinkerPop
Jason Plurad
 
JanusGraph: Looking Backward, Reaching Forward
Jason Plurad
 
Indexing the Real World Sensor Networks (at RE.WORK Internet of Things Summit...
Rainer Sternfeld
 
NetApp Flash Storage Facts
NetApp Insight
 
Reproducible Data Science with R
Revolution Analytics
 
Building Robust Production Data Pipelines with Databricks Delta
Databricks
 
Zillow's favorite big data & machine learning tools
njstevens
 
Driving the On-Demand Economy with Predictive Analytics
SingleStore
 
SnapLogic Live: IoT Integration
SnapLogic
 
NetApp By The Numbers
NetApp Insight
 
Scalable Machine Learning
Mikio L. Braun
 
Designing a Better Planet with Big Data and Sensor Networks (for Intelligent ...
Rainer Sternfeld
 
Building an IoT Kafka Pipeline in Under 5 Minutes
SingleStore
 
Introduction to the IBM Watson Data Platform
Margriet Groenendijk
 
The data behind the HuisKluis
Christophe Guéret
 
NetApp Cloud Storage Facts
NetApp Insight
 
Credit Fraud Prevention with Spark and Graph Analysis
Jen Aman
 
Ad

Similar to Big Data - part 5/7 of "7 modern trends that every IT Pro should know about" (20)

PPTX
Big Data Driven Solutions to Combat Covid' 19
Prof.Balakrishnan S
 
PPTX
Big data ppt
Nasrin Hussain
 
PDF
Big Data Analytics M1.pdf big data analytics
nithishlkumar9194
 
PPTX
Kartikey tripathi
KARTIKEY TRIPATHI
 
PPT
130214 copy
Arpit Arora
 
DOCX
Content1. Introduction2. What is Big Data3. Characte.docx
dickonsondorris
 
PPTX
Special issues on big data
Vedanand Singh
 
PDF
Bigdatappt 140225061440-phpapp01
nayanbhatia2
 
PPTX
Big_Data_ppt[1] (1).pptx
TanguturiAvinash
 
DOCX
Big data (word file)
Shahbaz Anjam
 
PPTX
ppt final.pptx
kalai75
 
PPTX
Big Data ppt
Vivek Gautam
 
PPTX
Big data with hadoop
Remas Ittahir
 
PPTX
big-data-8722-m8RQ3h1.pptx
VaishnavGhadge1
 
DOCX
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docx
tangyechloe
 
PPTX
bigdatappt.pptx
KrishnaTeja570279
 
PPTX
bigdata.pptx
KammetaJoshna
 
PPTX
Unit 1 - Introduction to Big Data and hadoop.pptx
2111CS010077SHAIKAZE
 
PDF
How to build and run a big data platform in the 21st century
Ali Dasdan
 
Big Data Driven Solutions to Combat Covid' 19
Prof.Balakrishnan S
 
Big data ppt
Nasrin Hussain
 
Big Data Analytics M1.pdf big data analytics
nithishlkumar9194
 
Kartikey tripathi
KARTIKEY TRIPATHI
 
130214 copy
Arpit Arora
 
Content1. Introduction2. What is Big Data3. Characte.docx
dickonsondorris
 
Special issues on big data
Vedanand Singh
 
Bigdatappt 140225061440-phpapp01
nayanbhatia2
 
Big_Data_ppt[1] (1).pptx
TanguturiAvinash
 
Big data (word file)
Shahbaz Anjam
 
ppt final.pptx
kalai75
 
Big Data ppt
Vivek Gautam
 
Big data with hadoop
Remas Ittahir
 
big-data-8722-m8RQ3h1.pptx
VaishnavGhadge1
 
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docx
tangyechloe
 
bigdatappt.pptx
KrishnaTeja570279
 
bigdata.pptx
KammetaJoshna
 
Unit 1 - Introduction to Big Data and hadoop.pptx
2111CS010077SHAIKAZE
 
How to build and run a big data platform in the 21st century
Ali Dasdan
 
Ad

More from Ibrahim Muhammadi (7)

PPTX
How land developers can benefit from the bitcoin phenomena.
Ibrahim Muhammadi
 
PPTX
How businesses can benefit by using Shared Ledger Technology.
Ibrahim Muhammadi
 
PPTX
Blockchain - part 6 of 7 modern trends that every it pro should know about-
Ibrahim Muhammadi
 
PPTX
Artificial intelligence - part 4/7 of "7 modern trends that every it pro must...
Ibrahim Muhammadi
 
PPTX
Serverless Architecture in application development - 7 modern trends every IT...
Ibrahim Muhammadi
 
PPTX
Centralisation of IAM (Identity and Access Management) 7 modern trends every ...
Ibrahim Muhammadi
 
PPTX
APIs and Micro-services - 7 modern trends every IT professional should know a...
Ibrahim Muhammadi
 
How land developers can benefit from the bitcoin phenomena.
Ibrahim Muhammadi
 
How businesses can benefit by using Shared Ledger Technology.
Ibrahim Muhammadi
 
Blockchain - part 6 of 7 modern trends that every it pro should know about-
Ibrahim Muhammadi
 
Artificial intelligence - part 4/7 of "7 modern trends that every it pro must...
Ibrahim Muhammadi
 
Serverless Architecture in application development - 7 modern trends every IT...
Ibrahim Muhammadi
 
Centralisation of IAM (Identity and Access Management) 7 modern trends every ...
Ibrahim Muhammadi
 
APIs and Micro-services - 7 modern trends every IT professional should know a...
Ibrahim Muhammadi
 

Recently uploaded (20)

PPTX
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
PPTX
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
PPT
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
PPTX
Introduction to Data Analytics and Data Science
KavithaCIT
 
PDF
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
PPTX
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
PDF
Technical Writing Module-I Complete Notes.pdf
VedprakashArya13
 
PPTX
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
PPTX
Introduction to computer chapter one 2017.pptx
mensunmarley
 
PPTX
HSE WEEKLY REPORT for dummies and lazzzzy.pptx
ahmedibrahim691723
 
PPTX
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
PPTX
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
PPTX
short term internship project on Data visualization
JMJCollegeComputerde
 
PPTX
Presentation on animal welfare a good topic
kidscream385
 
PDF
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
PPTX
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
PPTX
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
PPTX
M1-T1.pptxM1-T1.pptxM1-T1.pptxM1-T1.pptx
teodoroferiarevanojr
 
PPTX
Probability systematic sampling methods.pptx
PrakashRajput19
 
PPTX
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
Introduction to Data Analytics and Data Science
KavithaCIT
 
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
Technical Writing Module-I Complete Notes.pdf
VedprakashArya13
 
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
Introduction to computer chapter one 2017.pptx
mensunmarley
 
HSE WEEKLY REPORT for dummies and lazzzzy.pptx
ahmedibrahim691723
 
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
short term internship project on Data visualization
JMJCollegeComputerde
 
Presentation on animal welfare a good topic
kidscream385
 
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
M1-T1.pptxM1-T1.pptxM1-T1.pptxM1-T1.pptx
teodoroferiarevanojr
 
Probability systematic sampling methods.pptx
PrakashRajput19
 
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 

Big Data - part 5/7 of "7 modern trends that every IT Pro should know about"

  • 1. BIG DATA.7 modern trends that every IT pro should know about - part 5/7 cc: mkandlez - https://www.flickr.com/photos/25541021@N00
  • 2. Presented by Ibrahim Muhammadi. Founder - AppWorx.cc
  • 3. With more and more digitalization, there is huge amounts of structured, semistructured and unstructured data that is being generated. cc: phsymyst - https://www.flickr.com/photos/78624556@N08
  • 4. In the early days of this explosive growth in digital data, businesses used to discard additional data because there was no feasible way to make any sense out of itcc: Kentrosaurus - https://www.flickr.com/photos/86125591@N00
  • 5. But this is changing rapidly with advancements in infrastructure needed for data storage and processing collectively known as BIG DATA cc: Tom Raftery - https://www.flickr.com/photos/67945918@N00
  • 6. 3Vs of big data: extreme volume of data, wide variety of data types and the velocity at which the data must be processed cc: dalbera - https://www.flickr.com/photos/72746018@N00
  • 7. Such voluminous data can come from different sources, such as business sales records, the collected results of experiments, real-time sensors used in IOT and morecc: bionicteaching - https://www.flickr.com/photos/29096601@N00
  • 8. Adequate compute power is needed to achieve the desired velocity. This can potentially demand hundreds or thousands of servers that can distribute the work and operate collaboratively cc: midom - https://www.flickr.com/photos/81295370@N00
  • 9. In this short presentation we will look at some of the more popular tools that have made the Big Data revolution possible. cc: Glenn Zucman - https://www.flickr.com/photos/18182611@N00
  • 11. Distributed data storage and processing on consumer grade hardware makes big data feasible. One open source project for this is Hadoop. cc: NASA Goddard Photo and Video - https://www.flickr.com/photos/24662369@N07
  • 12. Hadoop enables distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up to thousands of machines. cc: solofotones - https://www.flickr.com/photos/14754973@N08
  • 13. Rather than rely on hardware to deliver high- availability, the Hadoop library is designed to detect and handle failures at the application layer, so delivering a highly-available service.cc: neil cummings - https://www.flickr.com/photos/23874985@N07
  • 15. Another open source tool that is used for Big Data is Elasticsearch which can do blazing fast searches on semistructured or unstructured datasets. cc: DocChewbacca - https://www.flickr.com/photos/49462908@N00
  • 16. Elasticsearch is a part of the Elastic stack or the ELK stack that also contains Logstash (a data collection and log parsing tool) and Kibana (for analytics and visualization)cc: PLeia2 - https://www.flickr.com/photos/64684255@N00
  • 18. Data migration using ETL (Extract - Transform - Load) does not work well with Big Data and hence the traditional ETL architecture is now changing to real-time data streaming cc: SidPix - https://www.flickr.com/photos/22357152@N02
  • 19. Apache Kafka is a high-throughput distributed message system that is being adopted by hundreds of companies to manage their real-time data. cc: r2hox - https://www.flickr.com/photos/72764087@N00
  • 20. Kafka is a perfect tool for building data pipelines: it is reliable, scalable, and efficient.cc: ikarusmedia - https://www.flickr.com/photos/32650580@N06
  • 21. R - the language and environment for statistical computing
  • 22. R is an integrated suite of software facilities for data manipulation, calculation and graphical display.cc: Crystal Writer - https://www.flickr.com/photos/17483452@N00
  • 23. With over 2 million users worldwide R is rapidly becoming the leading programming language in statistics and data science. cc: Marc_Smith - https://www.flickr.com/photos/49503165485@N01
  • 24. It is a great tool for data analysis and can be efficiently used on very large data sets.cc: Régis Gaidot - https://www.flickr.com/photos/22019171@N00
  • 25. Big Data is the next frontier for innovation, competition and productivity - in all fields from healthcare to retail, from manufacturing to personal and location data.cc: danielfoster437 - https://www.flickr.com/photos/17423713@N03
  • 26. In most industries, established competitors and new entrants will leverage data-driven strategies to innovate, compete, and capture value from deep real- time informationcc: verbeeldingskr8 - https://www.flickr.com/photos/35429044@N04
  • 27. We at appworx.cc offer data services that can help retail and other clients achieve their big data goals quickly. https://www.appworx.cc/datacc: Jason Michael - https://www.flickr.com/photos/70194213@N00