www.edureka.co/r-for-analytics
www.edureka.co/hadoop-admin
Hadoop : A Highly Available and Secure Enterprise Data
warehousing Solution
Slide 2Slide 2Slide 2 www.edureka.co/hadoop-admin
At the end of this webinar we will Know about:
 What is Big Data
 Why do Enterprise care about Big Data
 Why your DWH needs Hadoop?
 Security in Hadoop
 How Hadoop maintains high Availability
 Data warehousing tools in Hadoop
Agenda
Slide 3Slide 3Slide 3 www.edureka.co/hadoop-admin
What is Big Data
Slide 4Slide 4Slide 4 www.edureka.co/hadoop-admin
Slide 5Slide 5Slide 5 www.edureka.co/hadoop-admin
What is Wrong with our traditional DWH Solutions
Slide 6Slide 6Slide 6 www.edureka.co/hadoop-admin
 Storing Unstructured data like images and video
 Processing images and video
 Storing and processing other large files
 PDFs, Excel files
 Processing large blocks of natural language text
 Blog posts, job ads, product descriptions
 Processing semi-structured data
 CSV, JSON, XML, log files
 Sensor data
When RDBMS Makes no Sense?
Slide 7Slide 7Slide 7 www.edureka.co/hadoop-admin
 Ad-hoc, exploratory analytics
 Integrating data from external sources
 Data cleanup tasks
 Very advanced analytics (machine learning)
When RDBMS Makes no Sense?
Slide 8Slide 8Slide 8 www.edureka.co/hadoop-admin
 It is:
– Unstructured
– Unprocessed
– Un-aggregated
– Un-filtered
– Repetitive
– Low quality
– And generally messy.
Oh, and there is a lot of it.
Big Problems with Big Data
Slide 9Slide 9Slide 9 www.edureka.co/hadoop-admin
 Storage capacity
 Storage throughput
 Pipeline throughput
 Processing power
 Parallel processing
 System Integration
 Data Analysis
Scalable storage
Massive Parallel Processing
Ready to use tools
Technical Challenges
Slide 10Slide 10Slide 10 www.edureka.co/hadoop-admin
Too many channels for data
Technical Challenges
Slide 11Slide 11Slide 11 www.edureka.co/hadoop-admin
Why do Enterprise care about Big Data
Slide 12Slide 12Slide 12 www.edureka.co/hadoop-admin
Slide 13Slide 13Slide 13 www.edureka.co/hadoop-admin
Slide 14Slide 14Slide 14 www.edureka.co/hadoop-admin
You said RDBMS does not have
solution
for Big Data,
Then who has???
Slide 15Slide 15Slide 15 www.edureka.co/hadoop-admin
I Have The solution for Big Data Problem
Hadoop
Hadoop : The Savior
Slide 16Slide 16Slide 16 www.edureka.co/hadoop-admin
How Hadoop differs from RDBMS
Hadoop can store all types of data in it so that you have flexibility of analyzing all types of data.
You can drill down the big data to find even the rare insight which was not possible earlier.
Slide 17Slide 17Slide 17 www.edureka.co/hadoop-admin
First Load the data then do whatever you want to do.
This is Possible because of the cheap storage and distributed HDFS.
Hadoop Is The New DWH Solution
• This is ETL
• Before loading you should
transform data in particular
format
• This puts an restriction on the
type of data that can be stored
Slide 18Slide 18Slide 18 www.edureka.co/hadoop-admin
First Load the data then do whatever you want to do.
This is Possible because of the cheap storage and distributed HDFS.
Hadoop Is The New DWH Solution
• This is ETL
• Before loading you should
transform data in particular
format
• This puts an restriction on the
type of data that can be stored
• This is ELT
• There is no need to transform
the data beforehand
• You can have all kind of data on
board
• Freedom to work with all data
Slide 19Slide 19Slide 19 www.edureka.co/hadoop-admin
Hadoop is the new Data Warehouse for all kind of BI requirements.
Hadoop Does ELT Not ETL
Slide 20Slide 20Slide 20 www.edureka.co/hadoop-admin
Core Features of Hadoop
Slide 21Slide 21Slide 21 www.edureka.co/hadoop-admin
Hadoop Is Fault Tolerant And Super Consistent
Slide 22Slide 22Slide 22 www.edureka.co/hadoop-admin
Maintaining High Availability(HA)
In Distributed Computing, failure is a norm, which means YARN should have acceptable amount of availability
NameNode - No Horizontal Scale
NameNode - No High Availability
Data
Node
Data
Node
Data
Node
….
Client get Block Locations
Read Data
NameNode
NS
Block Management
Slide 23Slide 23Slide 23 www.edureka.co/hadoop-admin
 Secondary NameNode:
 "Not a hot standby" for the NameNode
 Connects to NameNode every hour*
 Housekeeping, backup of NemeNode metadata
 Saved metadata can build a failed NameNode
Secondary
NameNode
NameNode
metadata
metadata
Single Point
Failure
You give me
metadata
every hour, I
will make it
secure
NameNode – Single Point of Failure
Slide 24Slide 24Slide 24 www.edureka.co/hadoop-admin
Node Manager
HDFS
YARN
Resource
Manager
Shared
edit logs
All name space edits
logged to shared NFS
storage; single writer
(fencing)
Read edit logs and applies
to its own namespace
Secondary
Name Node
DataNode
Standby
NameNode
Active
NameNode
Container
App
Master
Node Manager
DataNode
Container
App
Master
Data Node
Client
DataNode
Container
App
Master
Node Manager
DataNode
Container
App
Master
Node Manager
NameNode
High
Availability
Next Generation
MapReduce
HDFS HIGH AVAILABILITY
http://hadoop.apache.org/docs/stable2/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithNFS.html
Hadoop 2.0 Cluster Architecture - HA
Demo
Achieving HDFS and
YARN High Availability
Slide 26Slide 26Slide 26 www.edureka.co/hadoop-admin
Hadoop is Secure
Slide 27Slide 27Slide 27 www.edureka.co/hadoop-admin
Security
 Service-level authorization and web proxy
capabilities in YARN.
 Access Control Lists(ACL) : The Hadoop
Distributed File System (HDFS) implements a
permissions model for files and directories that
shares much of the POSIX model
Slide 28Slide 28Slide 28 www.edureka.co/hadoop-admin
Security – Simple Flow
 Security Risks
 Insufficient Authentication
 Do not authenticate users services
 No Privacy and No Integrity
 Insecure Network Transport
 No Message level security
 Arbitrary Code Execution
 No User verification for MapReduce code
execution, malicious users could submit a job
Client Job Tracker
HDFS
Task Tracker
Task
HDFS
Task Tracker
Task
Slide 29Slide 29Slide 29 www.edureka.co/hadoop-admin
Managing users, permissions , quotas, etc …
Checking Resources Usage And Users Permissions
Demo
Demo on ACL
Slide 31Slide 31Slide 31 www.edureka.co/hadoop-admin
Hadoop provides traditional SQL interface as well as
NoSQL Interface foe data storage
Slide 32Slide 32Slide 32 www.edureka.co/hadoop-admin
Hive ??
Slide 33Slide 33Slide 33 www.edureka.co/hadoop-admin
Hive Architecture
Slide 34Slide 34Slide 34 www.edureka.co/hadoop-admin
Hbase and its Architecture??
Hive and HBase Integration
Questions
Slide 36
Slide 37
Your feedback is vital for us, be it a compliment, a suggestion or a complaint. It helps us to make your
experience better!
Please spare few minutes to take the survey after the webinar.
Survey
Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

More Related Content

PDF
Power Hadoop Cluster with AWS Cloud
PDF
Hadoop : The Pile of Big Data
PPTX
5 things one must know about spark!
PDF
Spark Will Replace Hadoop ! Know Why
PDF
5 things one must know about spark!
PDF
Spark For Faster Batch Processing
PDF
Hadoop Administration pdf
PDF
Big Data Processing with Spark and Scala
Power Hadoop Cluster with AWS Cloud
Hadoop : The Pile of Big Data
5 things one must know about spark!
Spark Will Replace Hadoop ! Know Why
5 things one must know about spark!
Spark For Faster Batch Processing
Hadoop Administration pdf
Big Data Processing with Spark and Scala

What's hot (20)

PDF
Why Talend for Big Data?
PDF
Hadoop Career Path and Interview Preparation
PDF
Hadoop Architecture and HDFS
PPTX
Introduction to Big Data and Hadoop
PDF
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
PPTX
Hadoop for Java Professionals
PDF
Big data Hadoop Analytic and Data warehouse comparison guide
PPTX
Hadoop vs Apache Spark
PDF
What Is Hadoop | Hadoop Tutorial For Beginners | Edureka
PDF
Hadoop Developer
PPTX
Scalding by Adform Research, Alex Gryzlov
PDF
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
PDF
Spark Streaming
PDF
20131205 hadoop-hdfs-map reduce-introduction
PDF
XML Parsing with Map Reduce
PDF
Secure Hadoop Cluster With Kerberos
PDF
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...
PPTX
PPTX
Big data and Hadoop
PPTX
PPT on Hadoop
Why Talend for Big Data?
Hadoop Career Path and Interview Preparation
Hadoop Architecture and HDFS
Introduction to Big Data and Hadoop
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop for Java Professionals
Big data Hadoop Analytic and Data warehouse comparison guide
Hadoop vs Apache Spark
What Is Hadoop | Hadoop Tutorial For Beginners | Edureka
Hadoop Developer
Scalding by Adform Research, Alex Gryzlov
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
Spark Streaming
20131205 hadoop-hdfs-map reduce-introduction
XML Parsing with Map Reduce
Secure Hadoop Cluster With Kerberos
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...
Big data and Hadoop
PPT on Hadoop

Similar to Hadoop a Highly Available and Secure Enterprise Data Warehousing solution (20)

PPTX
Introduction to Hadoop Administration
PDF
Introduction to Big Data and Hadoop
PPTX
Learn Big Data & Hadoop
PDF
Introduction to Big data & Hadoop -I
PDF
Introduction to Big Data & Hadoop
PPTX
Hadoop Adminstration with Latest Release (2.0)
PPTX
Big Data Analytics With Hadoop
PPTX
Hadoop Developer
PPTX
Hadoop for Data Warehousing professionals
PPTX
Big Data & Hadoop Tutorial
PDF
Hadoop MapReduce Framework
PPTX
Hadoop ppt on the basics and architecture
PDF
Understanding Big Data And Hadoop
PPTX
Hadoop info
PPTX
Learn Hadoop
PPTX
Big data and Hadoop Section..............
PDF
Infrastructure Considerations for Analytical Workloads
PDF
IRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop
PDF
What is hadoop
PDF
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Introduction to Hadoop Administration
Introduction to Big Data and Hadoop
Learn Big Data & Hadoop
Introduction to Big data & Hadoop -I
Introduction to Big Data & Hadoop
Hadoop Adminstration with Latest Release (2.0)
Big Data Analytics With Hadoop
Hadoop Developer
Hadoop for Data Warehousing professionals
Big Data & Hadoop Tutorial
Hadoop MapReduce Framework
Hadoop ppt on the basics and architecture
Understanding Big Data And Hadoop
Hadoop info
Learn Hadoop
Big data and Hadoop Section..............
Infrastructure Considerations for Analytical Workloads
IRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop
What is hadoop
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3

More from Edureka! (20)

PDF
What to learn during the 21 days Lockdown | Edureka
PDF
Top 10 Dying Programming Languages in 2020 | Edureka
PDF
Top 5 Trending Business Intelligence Tools | Edureka
PDF
Tableau Tutorial for Data Science | Edureka
PDF
Python Programming Tutorial | Edureka
PDF
Top 5 PMP Certifications | Edureka
PDF
Top Maven Interview Questions in 2020 | Edureka
PDF
Linux Mint Tutorial | Edureka
PDF
How to Deploy Java Web App in AWS| Edureka
PDF
Importance of Digital Marketing | Edureka
PDF
RPA in 2020 | Edureka
PDF
Email Notifications in Jenkins | Edureka
PDF
EA Algorithm in Machine Learning | Edureka
PDF
Cognitive AI Tutorial | Edureka
PDF
AWS Cloud Practitioner Tutorial | Edureka
PDF
Blue Prism Top Interview Questions | Edureka
PDF
Big Data on AWS Tutorial | Edureka
PDF
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
PDF
Kubernetes Installation on Ubuntu | Edureka
PDF
Introduction to DevOps | Edureka
What to learn during the 21 days Lockdown | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
Tableau Tutorial for Data Science | Edureka
Python Programming Tutorial | Edureka
Top 5 PMP Certifications | Edureka
Top Maven Interview Questions in 2020 | Edureka
Linux Mint Tutorial | Edureka
How to Deploy Java Web App in AWS| Edureka
Importance of Digital Marketing | Edureka
RPA in 2020 | Edureka
Email Notifications in Jenkins | Edureka
EA Algorithm in Machine Learning | Edureka
Cognitive AI Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
Blue Prism Top Interview Questions | Edureka
Big Data on AWS Tutorial | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Kubernetes Installation on Ubuntu | Edureka
Introduction to DevOps | Edureka

Recently uploaded (20)

PPTX
AQUEEL MUSHTAQUE FAKIH COMPUTER CENTER .
PDF
Peak of Data & AI Encore: Scalable Design & Infrastructure
PDF
CCUS-as-the-Missing-Link-to-Net-Zero_AksCurious.pdf
PPTX
Strategic Picks — Prioritising the Right Agentic Use Cases [2/6]
PPTX
Information-Technology-in-Human-Society.pptx
PPTX
CRM(Customer Relationship Managmnet) Presentation
PDF
Introduction to c language from lecture slides
PDF
1_Keynote_Breaking Barriers_한계를 넘어서_Charith Mendis.pdf
PPTX
Blending method and technology for hydrogen.pptx
PDF
Revolutionizing recommendations a survey: a comprehensive exploration of mode...
PDF
Optimizing bioinformatics applications: a novel approach with human protein d...
PDF
“Introduction to Designing with AI Agents,” a Presentation from Amazon Web Se...
PDF
State of AI in Business 2025 - MIT NANDA
PDF
Secure Java Applications against Quantum Threats
PDF
NewMind AI Journal Monthly Chronicles - August 2025
PDF
EGCB_Solar_Project_Presentation_and Finalcial Analysis.pdf
PPTX
Slides World Game (s) Great Redesign Eco Economic Epochs.pptx
PPT
Overviiew on Intellectual property right
PPTX
Rise of the Digital Control Grid Zeee Media and Hope and Tivon FTWProject.com
PPTX
Information-Technology-in-Human-Society (2).pptx
AQUEEL MUSHTAQUE FAKIH COMPUTER CENTER .
Peak of Data & AI Encore: Scalable Design & Infrastructure
CCUS-as-the-Missing-Link-to-Net-Zero_AksCurious.pdf
Strategic Picks — Prioritising the Right Agentic Use Cases [2/6]
Information-Technology-in-Human-Society.pptx
CRM(Customer Relationship Managmnet) Presentation
Introduction to c language from lecture slides
1_Keynote_Breaking Barriers_한계를 넘어서_Charith Mendis.pdf
Blending method and technology for hydrogen.pptx
Revolutionizing recommendations a survey: a comprehensive exploration of mode...
Optimizing bioinformatics applications: a novel approach with human protein d...
“Introduction to Designing with AI Agents,” a Presentation from Amazon Web Se...
State of AI in Business 2025 - MIT NANDA
Secure Java Applications against Quantum Threats
NewMind AI Journal Monthly Chronicles - August 2025
EGCB_Solar_Project_Presentation_and Finalcial Analysis.pdf
Slides World Game (s) Great Redesign Eco Economic Epochs.pptx
Overviiew on Intellectual property right
Rise of the Digital Control Grid Zeee Media and Hope and Tivon FTWProject.com
Information-Technology-in-Human-Society (2).pptx

Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

  • 1. www.edureka.co/r-for-analytics www.edureka.co/hadoop-admin Hadoop : A Highly Available and Secure Enterprise Data warehousing Solution
  • 2. Slide 2Slide 2Slide 2 www.edureka.co/hadoop-admin At the end of this webinar we will Know about:  What is Big Data  Why do Enterprise care about Big Data  Why your DWH needs Hadoop?  Security in Hadoop  How Hadoop maintains high Availability  Data warehousing tools in Hadoop Agenda
  • 3. Slide 3Slide 3Slide 3 www.edureka.co/hadoop-admin What is Big Data
  • 4. Slide 4Slide 4Slide 4 www.edureka.co/hadoop-admin
  • 5. Slide 5Slide 5Slide 5 www.edureka.co/hadoop-admin What is Wrong with our traditional DWH Solutions
  • 6. Slide 6Slide 6Slide 6 www.edureka.co/hadoop-admin  Storing Unstructured data like images and video  Processing images and video  Storing and processing other large files  PDFs, Excel files  Processing large blocks of natural language text  Blog posts, job ads, product descriptions  Processing semi-structured data  CSV, JSON, XML, log files  Sensor data When RDBMS Makes no Sense?
  • 7. Slide 7Slide 7Slide 7 www.edureka.co/hadoop-admin  Ad-hoc, exploratory analytics  Integrating data from external sources  Data cleanup tasks  Very advanced analytics (machine learning) When RDBMS Makes no Sense?
  • 8. Slide 8Slide 8Slide 8 www.edureka.co/hadoop-admin  It is: – Unstructured – Unprocessed – Un-aggregated – Un-filtered – Repetitive – Low quality – And generally messy. Oh, and there is a lot of it. Big Problems with Big Data
  • 9. Slide 9Slide 9Slide 9 www.edureka.co/hadoop-admin  Storage capacity  Storage throughput  Pipeline throughput  Processing power  Parallel processing  System Integration  Data Analysis Scalable storage Massive Parallel Processing Ready to use tools Technical Challenges
  • 10. Slide 10Slide 10Slide 10 www.edureka.co/hadoop-admin Too many channels for data Technical Challenges
  • 11. Slide 11Slide 11Slide 11 www.edureka.co/hadoop-admin Why do Enterprise care about Big Data
  • 12. Slide 12Slide 12Slide 12 www.edureka.co/hadoop-admin
  • 13. Slide 13Slide 13Slide 13 www.edureka.co/hadoop-admin
  • 14. Slide 14Slide 14Slide 14 www.edureka.co/hadoop-admin You said RDBMS does not have solution for Big Data, Then who has???
  • 15. Slide 15Slide 15Slide 15 www.edureka.co/hadoop-admin I Have The solution for Big Data Problem Hadoop Hadoop : The Savior
  • 16. Slide 16Slide 16Slide 16 www.edureka.co/hadoop-admin How Hadoop differs from RDBMS Hadoop can store all types of data in it so that you have flexibility of analyzing all types of data. You can drill down the big data to find even the rare insight which was not possible earlier.
  • 17. Slide 17Slide 17Slide 17 www.edureka.co/hadoop-admin First Load the data then do whatever you want to do. This is Possible because of the cheap storage and distributed HDFS. Hadoop Is The New DWH Solution • This is ETL • Before loading you should transform data in particular format • This puts an restriction on the type of data that can be stored
  • 18. Slide 18Slide 18Slide 18 www.edureka.co/hadoop-admin First Load the data then do whatever you want to do. This is Possible because of the cheap storage and distributed HDFS. Hadoop Is The New DWH Solution • This is ETL • Before loading you should transform data in particular format • This puts an restriction on the type of data that can be stored • This is ELT • There is no need to transform the data beforehand • You can have all kind of data on board • Freedom to work with all data
  • 19. Slide 19Slide 19Slide 19 www.edureka.co/hadoop-admin Hadoop is the new Data Warehouse for all kind of BI requirements. Hadoop Does ELT Not ETL
  • 20. Slide 20Slide 20Slide 20 www.edureka.co/hadoop-admin Core Features of Hadoop
  • 21. Slide 21Slide 21Slide 21 www.edureka.co/hadoop-admin Hadoop Is Fault Tolerant And Super Consistent
  • 22. Slide 22Slide 22Slide 22 www.edureka.co/hadoop-admin Maintaining High Availability(HA) In Distributed Computing, failure is a norm, which means YARN should have acceptable amount of availability NameNode - No Horizontal Scale NameNode - No High Availability Data Node Data Node Data Node …. Client get Block Locations Read Data NameNode NS Block Management
  • 23. Slide 23Slide 23Slide 23 www.edureka.co/hadoop-admin  Secondary NameNode:  "Not a hot standby" for the NameNode  Connects to NameNode every hour*  Housekeeping, backup of NemeNode metadata  Saved metadata can build a failed NameNode Secondary NameNode NameNode metadata metadata Single Point Failure You give me metadata every hour, I will make it secure NameNode – Single Point of Failure
  • 24. Slide 24Slide 24Slide 24 www.edureka.co/hadoop-admin Node Manager HDFS YARN Resource Manager Shared edit logs All name space edits logged to shared NFS storage; single writer (fencing) Read edit logs and applies to its own namespace Secondary Name Node DataNode Standby NameNode Active NameNode Container App Master Node Manager DataNode Container App Master Data Node Client DataNode Container App Master Node Manager DataNode Container App Master Node Manager NameNode High Availability Next Generation MapReduce HDFS HIGH AVAILABILITY http://hadoop.apache.org/docs/stable2/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithNFS.html Hadoop 2.0 Cluster Architecture - HA
  • 25. Demo Achieving HDFS and YARN High Availability
  • 26. Slide 26Slide 26Slide 26 www.edureka.co/hadoop-admin Hadoop is Secure
  • 27. Slide 27Slide 27Slide 27 www.edureka.co/hadoop-admin Security  Service-level authorization and web proxy capabilities in YARN.  Access Control Lists(ACL) : The Hadoop Distributed File System (HDFS) implements a permissions model for files and directories that shares much of the POSIX model
  • 28. Slide 28Slide 28Slide 28 www.edureka.co/hadoop-admin Security – Simple Flow  Security Risks  Insufficient Authentication  Do not authenticate users services  No Privacy and No Integrity  Insecure Network Transport  No Message level security  Arbitrary Code Execution  No User verification for MapReduce code execution, malicious users could submit a job Client Job Tracker HDFS Task Tracker Task HDFS Task Tracker Task
  • 29. Slide 29Slide 29Slide 29 www.edureka.co/hadoop-admin Managing users, permissions , quotas, etc … Checking Resources Usage And Users Permissions
  • 31. Slide 31Slide 31Slide 31 www.edureka.co/hadoop-admin Hadoop provides traditional SQL interface as well as NoSQL Interface foe data storage
  • 32. Slide 32Slide 32Slide 32 www.edureka.co/hadoop-admin Hive ??
  • 33. Slide 33Slide 33Slide 33 www.edureka.co/hadoop-admin Hive Architecture
  • 34. Slide 34Slide 34Slide 34 www.edureka.co/hadoop-admin Hbase and its Architecture??
  • 35. Hive and HBase Integration
  • 37. Slide 37 Your feedback is vital for us, be it a compliment, a suggestion or a complaint. It helps us to make your experience better! Please spare few minutes to take the survey after the webinar. Survey

Editor's Notes

  • #9: Big data is not called big data because it fits well into a thumb-drive. It requires a lot of storage, partially because it’s a lot of data. Partially because it is unstructured, unprocessed, un-aggregated, repetitive and generally messy