Page 1 of 5
Big Data Architect - Hadoop & its Eco-system Training
Objective
The participants will learn the Installation of Hadoop Cluster, understand the basic,
advanced concepts of Map Reduce and the best practices for Apache Hadoop
Development as experienced by the Developers, Architects and Data Analysts of
core Apache Hadoop. They will also learn the following during the duration of the
course
1. Hadoop Ecosystem
2. Best programming practices for Map Reduce
3. System administration issues with other Hadoop projects such as Hive,
Pig, and Scoop
4. Configuration Map Reduce environment with Eclipse IDE
5. Running MR Unit Tests on MR Code
6. Advanced Map Reduce Algorithms and techniques
7. Working with Pig and HIVE
8. Working with NoSQL with emphasis on HBase
Note: The course will be have 40% of theoretical discussion and 60% of actual
hands on
Duration:
28 ~ 30 hours
Audience
This course is designed for anyone who is
1. Wanting to architect a project using Hadoop and its Eco System
components.
2. Wanting to develop Map Reduce programs
3. A Business Analyst or Data Warehousing person looking at alternative
approach to data analysis and storage.
Pre-Requisites
1. The participants should have at least basic knowledge of Java.
2. Any experience of Linux environment will be very helpful.
Course Outline
1. What is Big Data & Why Hadoop?
Page 2 of 5
• Big Data Characteristics, Challenges with traditional system
2. Hadoop Overview & Ecosystem
• Anatomy of Hadoop Cluster, Installing and Configuring Hadoop
• Hands-On Exercise
3. Hadoop Architecture
• Components in Hadoop
• Interaction between different Components
• Basic Understanding of each component
4. HDFS – Hadoop Distributed File System
• Name Nodes and Data Nodes
• Hands-On Exercise
5. Map Reduce Anatomy
• How Map Reduce Works?
• The Mapper & Reducer, InputFormats & OutputFormats, Data Type
• Hands-On Exercise
6. Understanding Cloudera Distribution
• What is CDH?
• Components in CDH
• Hands on Exercise
7. Understanding HortonWorks Distribution
• What is HDP?
• Components in HDP
• Hands on Exercise
8. Pseudo – Cluster Distribution of Vanilla Hadoop
• Hadoop Extraction and Installation
• Configuration / XML Files
• Hands on Exercise
Page 3 of 5
9. YARN
• Need for YARN
• Architecture of YARN
10. Installation of YARN in Ubuntu
• Configuration Settings
• Difference between Gen1 and Gen2`
11. Developing Map Reduce Programs
• Setting up Eclipse Development Environment, Creating Map Reduce
Projects, Debugging Map Reduce Code
• Hands-On Exercise
13. Advanced Tips & Techniques
• Determining optimal number of reducers, skipping bad records
• Partitioning into multiple output files & Passing parameters to tasks
• Optimizing Hadoop Cluster & Performance Tuning
14. Monitoring & Management of Hadoop
• Managing HDFS with Tools
• Using HDFS & Job Tracker Web UI
• Routine Administration Procedures
• Commissioning and decommissioning of nodes
• Hands-On Exercise
15. Using Hive
• Hive as a Data Warehouse
• Creating External & Internal Tables plus Loading Data
• Writing HSQL queries for data retrieval
• Creating partitions and querying data.
16. Using Pig
• Why Pig and its benefits
• Loading data into PigStorage
Page 4 of 5
• Querying data from PigStorage
• Hands-On Exercise
17. Sqoop
• Importing and Exporting data from using RDBMS
• Hands-On Exercise
18. Understanding the Other SQL options in Hadoop
• Intro to Stinger
• Intro to Impala
19. Hadoop Best Practices and Use Cases
20. NoSql Introduction
• What is NoSQL?
• Variation of NoSQL
• Advantage of Columnar Database
21. HBase
• Hbase Overview and Architecture
• Hbase v/s RDBMS
• Hbase Table Design
• Column Families and Regions
• Hbase Java API code
• Hands-On Exercise
• Hbase Installation
• Hbase shell commands
• Java Administration API
• Performance Tuning
23. Oozie – Work Flow Scheduler
• Why Workflow in Hadoop
• Understanding Configuration in Oozie
Page 5 of 5
Take Away from the Course
1. Understanding of What and Why of Hadoop with its Eco-System
Components.
2. Ability to write Map Reduce programs in a given scenario
3. Ability to correctly architect and implement the Best Practices in Hadoop
Development
4. Ability to Manage and Monitor Hadoop
5. Ability to Manage the different Hadoop Components when talking to each
other.

More Related Content

PDF
Hadoop_Admin_eVenkat
PDF
Day1_23Aug.txt - Notepad
PDF
Hadoop_RealTime_Processing_eVenkat
PPTX
Hadoop Architecture
PPTX
PPTX
project--2 nd review_2
PPTX
Intro to Apache Spark by Marco Vasquez
PDF
Hadoop ecosystem
Hadoop_Admin_eVenkat
Day1_23Aug.txt - Notepad
Hadoop_RealTime_Processing_eVenkat
Hadoop Architecture
project--2 nd review_2
Intro to Apache Spark by Marco Vasquez
Hadoop ecosystem

What's hot (20)

PPTX
Hadoop
PPTX
Hadoop vs Apache Spark
PPT
PPTX
HADOOP TECHNOLOGY ppt
PPTX
Hadoop And Their Ecosystem
PDF
Sharing resources with non-Hadoop workloads
PDF
Introduction To Hadoop Ecosystem
PDF
Spark vs Hadoop
PDF
Big Data and Hadoop Ecosystem
PDF
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...
PPTX
Big Data Certifications Workshop - 201711 - Introduction and Database Essentials
PPTX
Hadoop Architecture
PPTX
Hadoop
PPT
Hadoop distributions - ecosystem
PPT
Hadoop course content @ a1 trainingss
PPTX
Big data and tools
PDF
Hadoop Ecosystem
PPTX
Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
PDF
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Hadoop
Hadoop vs Apache Spark
HADOOP TECHNOLOGY ppt
Hadoop And Their Ecosystem
Sharing resources with non-Hadoop workloads
Introduction To Hadoop Ecosystem
Spark vs Hadoop
Big Data and Hadoop Ecosystem
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...
Big Data Certifications Workshop - 201711 - Introduction and Database Essentials
Hadoop Architecture
Hadoop
Hadoop distributions - ecosystem
Hadoop course content @ a1 trainingss
Big data and tools
Hadoop Ecosystem
Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Ad

Similar to Hadoop_Architect__eVenkat (20)

PPTX
Big data - Online Training
PDF
Big data analytics_using_hadoop
PDF
Technix-Pro Cloudera Certified Admin for Hadoop Exam Prep.
DOCX
Hadoop online training in india
PDF
Hadoop online training
PDF
Learn Hadoop at your Leisure time
PPTX
project--2 nd review_2
PPTX
Asbury Hadoop Overview
PDF
Hadoop 2.0-development
PDF
Best hadoop-online-training
DOCX
PDF
Hd insight essentials quick view
PDF
Hd insight essentials quick view
PDF
HdInsight essentials Hadoop on Microsoft Platform
PDF
Hadoop course content
PDF
50 Shades of SQL
ODT
Hadoop online trainings
PPSX
Hadoop Online training from www. Imaginelife.in
PDF
Hadoop 80hr v1.0
PPTX
Hadoop And Their Ecosystem ppt
Big data - Online Training
Big data analytics_using_hadoop
Technix-Pro Cloudera Certified Admin for Hadoop Exam Prep.
Hadoop online training in india
Hadoop online training
Learn Hadoop at your Leisure time
project--2 nd review_2
Asbury Hadoop Overview
Hadoop 2.0-development
Best hadoop-online-training
Hd insight essentials quick view
Hd insight essentials quick view
HdInsight essentials Hadoop on Microsoft Platform
Hadoop course content
50 Shades of SQL
Hadoop online trainings
Hadoop Online training from www. Imaginelife.in
Hadoop 80hr v1.0
Hadoop And Their Ecosystem ppt
Ad

Hadoop_Architect__eVenkat

  • 1. Page 1 of 5 Big Data Architect - Hadoop & its Eco-system Training Objective The participants will learn the Installation of Hadoop Cluster, understand the basic, advanced concepts of Map Reduce and the best practices for Apache Hadoop Development as experienced by the Developers, Architects and Data Analysts of core Apache Hadoop. They will also learn the following during the duration of the course 1. Hadoop Ecosystem 2. Best programming practices for Map Reduce 3. System administration issues with other Hadoop projects such as Hive, Pig, and Scoop 4. Configuration Map Reduce environment with Eclipse IDE 5. Running MR Unit Tests on MR Code 6. Advanced Map Reduce Algorithms and techniques 7. Working with Pig and HIVE 8. Working with NoSQL with emphasis on HBase Note: The course will be have 40% of theoretical discussion and 60% of actual hands on Duration: 28 ~ 30 hours Audience This course is designed for anyone who is 1. Wanting to architect a project using Hadoop and its Eco System components. 2. Wanting to develop Map Reduce programs 3. A Business Analyst or Data Warehousing person looking at alternative approach to data analysis and storage. Pre-Requisites 1. The participants should have at least basic knowledge of Java. 2. Any experience of Linux environment will be very helpful. Course Outline 1. What is Big Data & Why Hadoop?
  • 2. Page 2 of 5 • Big Data Characteristics, Challenges with traditional system 2. Hadoop Overview & Ecosystem • Anatomy of Hadoop Cluster, Installing and Configuring Hadoop • Hands-On Exercise 3. Hadoop Architecture • Components in Hadoop • Interaction between different Components • Basic Understanding of each component 4. HDFS – Hadoop Distributed File System • Name Nodes and Data Nodes • Hands-On Exercise 5. Map Reduce Anatomy • How Map Reduce Works? • The Mapper & Reducer, InputFormats & OutputFormats, Data Type • Hands-On Exercise 6. Understanding Cloudera Distribution • What is CDH? • Components in CDH • Hands on Exercise 7. Understanding HortonWorks Distribution • What is HDP? • Components in HDP • Hands on Exercise 8. Pseudo – Cluster Distribution of Vanilla Hadoop • Hadoop Extraction and Installation • Configuration / XML Files • Hands on Exercise
  • 3. Page 3 of 5 9. YARN • Need for YARN • Architecture of YARN 10. Installation of YARN in Ubuntu • Configuration Settings • Difference between Gen1 and Gen2` 11. Developing Map Reduce Programs • Setting up Eclipse Development Environment, Creating Map Reduce Projects, Debugging Map Reduce Code • Hands-On Exercise 13. Advanced Tips & Techniques • Determining optimal number of reducers, skipping bad records • Partitioning into multiple output files & Passing parameters to tasks • Optimizing Hadoop Cluster & Performance Tuning 14. Monitoring & Management of Hadoop • Managing HDFS with Tools • Using HDFS & Job Tracker Web UI • Routine Administration Procedures • Commissioning and decommissioning of nodes • Hands-On Exercise 15. Using Hive • Hive as a Data Warehouse • Creating External & Internal Tables plus Loading Data • Writing HSQL queries for data retrieval • Creating partitions and querying data. 16. Using Pig • Why Pig and its benefits • Loading data into PigStorage
  • 4. Page 4 of 5 • Querying data from PigStorage • Hands-On Exercise 17. Sqoop • Importing and Exporting data from using RDBMS • Hands-On Exercise 18. Understanding the Other SQL options in Hadoop • Intro to Stinger • Intro to Impala 19. Hadoop Best Practices and Use Cases 20. NoSql Introduction • What is NoSQL? • Variation of NoSQL • Advantage of Columnar Database 21. HBase • Hbase Overview and Architecture • Hbase v/s RDBMS • Hbase Table Design • Column Families and Regions • Hbase Java API code • Hands-On Exercise • Hbase Installation • Hbase shell commands • Java Administration API • Performance Tuning 23. Oozie – Work Flow Scheduler • Why Workflow in Hadoop • Understanding Configuration in Oozie
  • 5. Page 5 of 5 Take Away from the Course 1. Understanding of What and Why of Hadoop with its Eco-System Components. 2. Ability to write Map Reduce programs in a given scenario 3. Ability to correctly architect and implement the Best Practices in Hadoop Development 4. Ability to Manage and Monitor Hadoop 5. Ability to Manage the different Hadoop Components when talking to each other.