SlideShare a Scribd company logo
Hadoop Admin Best Practices with HDP 2.3
Part-2
 We have INSTRUCTOR LED - both Online LIVE & Classroom Session
 Present for classroom sessions in Bangalore & Delhi (NCR)
 We are the ONLY Education delivery partners for Mulesoft, Elastic, Pivotal & Lightbend in India
 We have delivered more than 5000 trainings and have over 400 courses and a vast pool of over 200 experts to make YOU the
EXPERT!
FOLLOW US ON SOCIAL MEDIA TO STAY UPDATED ON THE UPCOMING WEBINARS
Online and Classroom Training on Technology Courses
at SpringPeople
Certified Partners
Non-Certified Courses
…and many more
…NOW
The Hadoop Ecosystem
Hadoop
The HDP 2.3 Platform
Versions
Covered Till Now
1. Use Ambari – Cluster Management Tool
2. More of WebHDFS Access
3. WebHDFS
4. Use More of HDFS Access Control Lists
5. Use HDFS Quotas
6. Understanding of YARN Components
7. Adding, Deleting, or Replacing Worker Nodes
8. Rack Awareness
9. NameNode High Availability
10. ResourceManager High Availability
11. Ambari Metrics System
12. What to Backup?
13 – Setting appropriate Directory Space Quota
• Best practice is to also set space limits on home directory To set a
12TB limit:
$ hdfs dfsadmin –setSpaceQuota 12t /user/username
• Includes space for replication
• This is the actual use of space
• Example:
• If storing 1TB and replication factor is 3
• 3TB is needed
• Quota can be set on any directory
14 - Configuring Trash
• Enable by setting time delay for trash's checkpoint removal:
In core-site.xml
• fs.trash.interval
• Delay is set in minutes (24 hours would be 1440 minutes)
• Recommendation is to set to 360 minutes (6 hours)
• Setting the value to 0 disables Trash
• Files deleted programmatically are deleted immediately
• Files can be immediately deleted from the command line using -skipTrash
15 - Compression Needs and Tradeoffs
 Compressing data can speed up data-intensive I/O operations
• MapReduce jobs are almost always I/O bound
 Compressed data can save storage space and speed up data transfers across the network
• Capital allocation for hardware can go further
 Reduced I/O and network load can result in significant performance improvements
• MapReduce jobs can finish faster overall
 But, CPU utilization and processing time increase during compression and decompression
• Understanding the tradeoffs is important for MapReduce pipeline’s overall performance
16 - Sqoop Security
• Database Authentication:
• Sqoop needs to authenticate to the RDBMS
• How?
• Usually this involves a username/password
(Oracle Wallet is the exception)
• Can hard code password in scripts (not recommended/used)
• Password usually stored in plaintext in a file protected by the filesystem
• Hadoop Credential Management Framework added in HDP 2.2
• Not a keystore, but a way to interface with keystore backends
• Passwords can be stored in a keystore and not in plain text
• Can help with “no passwords in plaintext” requirements
17 - distcp Configurations
• If Distcp runs out of memory before copying:
• Possible Cause: Number of files/directories being copied from source
path(s) is extremely large (e.g. 100,000 paths)
• Change: HEAP Size
- Export HADOOP_CLIENT_OPTS="-Xms64m -Xmx1024m”
• Map Sizing
• If -m is not specified: Default to 20 maps max
• Tuning the number of maps to:
- Size of the source and destination cluster
- The size of the copy
- Available bandwidth
18 - Falcon
 Centrally manages data lifecycle
• Centralized definition & management of pipelines for data ingest,
process and export
 Supports Business continuity and Disaster
Recovery
• Out of the box policies for data replication
and retention
• End-to-end monitoring of data pipelines
 Addresses basic audit & compliance requirements
• Visualize data pipeline lineage
• Track data pipeline audit logs
• Tag data with business metadata
19 - Running Balancer
• Can be run periodically as a batch job
• Examples: every 24 hours or weekly
• Run after new nodes have been added to the cluster
• To run balancer:
hdfs balancer [-threshold <threshold>] [-policy <policy>]]
• Runs until there are no blocks to move
or
Until it has lost contact with the NameNode
• Can be stopped with a Ctrl+C
20 - HDFS Snapshots
Create HDFS directory snapshots
Fast operation - only metadata affected
Results in .snapshot/ directory in the HDFS directory
Snapshots are named or default to timestamp
Directories must be made snapshottable
Snapshot Steps:
– Allow snapshot on directory
hdfs dfsadmin -allowSnapshot foo/bar/
– Create snapshot for directory and optionally provide snapshot name
hdfs dfs -createSnapshot foo/bar/ mysnapshot_today
– Verify snapshot
hdfs dfs -ls foo/bar/.snapshot
21 - HDFS Data – Automate & Restore
• Use Falcon/Oozie to automate backups
• Falcon utilizes Oozie as a workflow scheduler
• distcp is an Oozie action
- use -update and -prbugp
• Restoring is the reverse process of backups
1. On your backup cluster choose which snapshot to restore
2. Remove/move target directory on production system
3. Run distcp without -update options
22 - Apache Ranger
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) 2.3 _Part 2
www.springpeople.comtraining@springpeople.com
Upcoming Hortonworks Classes at
SpringPeople
Classroom (Bengaluru)
05 - 08 Sept
26 - 28 Sept
10 - 13 Oct
07 - 10 Nov
05 - 08 Dec
19 - 21 Dec
Online LIVE
22 - 31 Aug
05 - 17 Sept
19 Sept - 01 Oct

More Related Content

What's hot (20)

PPTX
Empower Data-Driven Organizations with HPE and Hadoop
DataWorks Summit/Hadoop Summit
 
PPTX
Intro to Apache Kudu (short) - Big Data Application Meetup
Mike Percy
 
PPTX
A brave new world in mutable big data relational storage (Strata NYC 2017)
Todd Lipcon
 
PPTX
How the Internet of Things are Turning the Internet Upside Down
DataWorks Summit
 
PPTX
Architecting Applications with Hadoop
markgrover
 
PDF
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
PDF
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
jdcryans
 
PDF
Kudu - Fast Analytics on Fast Data
Ryan Bosshart
 
PPTX
Apache Tez - A unifying Framework for Hadoop Data Processing
DataWorks Summit
 
PPTX
HDFS Tiered Storage: Mounting Object Stores in HDFS
DataWorks Summit
 
PPTX
CBlocks - Posix compliant files systems for HDFS
DataWorks Summit
 
PDF
Sql on everything with drill
Julien Le Dem
 
PDF
The Heterogeneous Data lake
DataWorks Summit/Hadoop Summit
 
PPTX
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
 
PPTX
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
PPTX
Hadoop 3 in a Nutshell
DataWorks Summit/Hadoop Summit
 
PPTX
Tune up Yarn and Hive
rxu
 
PDF
Hortonworks.Cluster Config Guide
Douglas Bernardini
 
PPTX
Data Wrangling and Oracle Connectors for Hadoop
Gwen (Chen) Shapira
 
PPTX
HBaseCon 2015: HBase and Spark
HBaseCon
 
Empower Data-Driven Organizations with HPE and Hadoop
DataWorks Summit/Hadoop Summit
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Mike Percy
 
A brave new world in mutable big data relational storage (Strata NYC 2017)
Todd Lipcon
 
How the Internet of Things are Turning the Internet Upside Down
DataWorks Summit
 
Architecting Applications with Hadoop
markgrover
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
jdcryans
 
Kudu - Fast Analytics on Fast Data
Ryan Bosshart
 
Apache Tez - A unifying Framework for Hadoop Data Processing
DataWorks Summit
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
DataWorks Summit
 
CBlocks - Posix compliant files systems for HDFS
DataWorks Summit
 
Sql on everything with drill
Julien Le Dem
 
The Heterogeneous Data lake
DataWorks Summit/Hadoop Summit
 
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
 
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
Hadoop 3 in a Nutshell
DataWorks Summit/Hadoop Summit
 
Tune up Yarn and Hive
rxu
 
Hortonworks.Cluster Config Guide
Douglas Bernardini
 
Data Wrangling and Oracle Connectors for Hadoop
Gwen (Chen) Shapira
 
HBaseCon 2015: HBase and Spark
HBaseCon
 

Similar to Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) 2.3 _Part 2 (20)

PDF
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Hortonworks
 
PDF
Hadoop Operations - Best practices from the field
Uwe Printz
 
PDF
Instant Download Hadoop Operations 1st Edition Eric Sammer PDF All Chapters
istvanysmoni
 
PPTX
Apache Hadoop 0.23 at Hadoop World 2011
Hortonworks
 
PPTX
Hadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton Works
Cloudera, Inc.
 
PPTX
Hadoop operations-2015-hadoop-summit-san-jose-v5
Chris Nauroth
 
PDF
Interacting with hdfs
Pradeep Kumbhar
 
PDF
Scaling Hadoop at LinkedIn
DataWorks Summit
 
PDF
Hadoop Operations: Keeping the Elephant Running Smoothly
Michael Arnold
 
PPTX
Hadoop configuration & performance tuning
Vitthal Gogate
 
PPTX
Managing growth in Production Hadoop Deployments
DataWorks Summit
 
PDF
Inside the Hadoop Machine @ VMworld
Richard McDougall
 
PDF
App cap2956v2-121001194956-phpapp01 (1)
outstanding59
 
PDF
App Cap2956v2 121001194956 Phpapp01 (1)
outstanding59
 
PDF
Hadoop Operations 1st Edition Eric Sammer
ampofmangga
 
PPTX
Hadoop Migration from 0.20.2 to 2.0
Jabir Ahmed
 
PDF
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Cloudera, Inc.
 
PDF
field_guide_to_hadoop_pentaho
Martin Ferguson
 
PDF
Hadoop tools with Examples
Joe McTee
 
PDF
The Evolution of Hadoop at Spotify - Through Failures and Pain
Rafał Wojdyła
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Hortonworks
 
Hadoop Operations - Best practices from the field
Uwe Printz
 
Instant Download Hadoop Operations 1st Edition Eric Sammer PDF All Chapters
istvanysmoni
 
Apache Hadoop 0.23 at Hadoop World 2011
Hortonworks
 
Hadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton Works
Cloudera, Inc.
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Chris Nauroth
 
Interacting with hdfs
Pradeep Kumbhar
 
Scaling Hadoop at LinkedIn
DataWorks Summit
 
Hadoop Operations: Keeping the Elephant Running Smoothly
Michael Arnold
 
Hadoop configuration & performance tuning
Vitthal Gogate
 
Managing growth in Production Hadoop Deployments
DataWorks Summit
 
Inside the Hadoop Machine @ VMworld
Richard McDougall
 
App cap2956v2-121001194956-phpapp01 (1)
outstanding59
 
App Cap2956v2 121001194956 Phpapp01 (1)
outstanding59
 
Hadoop Operations 1st Edition Eric Sammer
ampofmangga
 
Hadoop Migration from 0.20.2 to 2.0
Jabir Ahmed
 
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Cloudera, Inc.
 
field_guide_to_hadoop_pentaho
Martin Ferguson
 
Hadoop tools with Examples
Joe McTee
 
The Evolution of Hadoop at Spotify - Through Failures and Pain
Rafał Wojdyła
 
Ad

More from SpringPeople (20)

PPTX
Growth hacking tips and tricks that you can try
SpringPeople
 
PPTX
Top Big data Analytics tools: Emerging trends and Best practices
SpringPeople
 
PPTX
Introduction to Big Data
SpringPeople
 
PPTX
Introduction to Microsoft Azure IaaS
SpringPeople
 
PPTX
Introduction to Selenium WebDriver
SpringPeople
 
PPT
Introduction to Open stack - An Overview
SpringPeople
 
PPT
Why 2 million Developers depend on MuleSoft
SpringPeople
 
PPTX
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
SpringPeople
 
PPTX
Mastering Test Automation: How To Use Selenium Successfully
SpringPeople
 
PPTX
An Introduction of Big data; Big data for beginners; Overview of Big Data; Bi...
SpringPeople
 
PDF
SpringPeople - Introduction to Cloud Computing
SpringPeople
 
PDF
SpringPeople - Devops skills - Do you have what it takes?
SpringPeople
 
PPTX
Elastic - ELK, Logstash & Kibana
SpringPeople
 
PPTX
Hadoop data access layer v4.0
SpringPeople
 
PDF
Introduction To Core Java - SpringPeople
SpringPeople
 
PDF
Introduction To Hadoop Administration - SpringPeople
SpringPeople
 
PDF
Introduction To Cloud Foundry - SpringPeople
SpringPeople
 
PDF
Introduction To Spring Enterprise Integration - SpringPeople
SpringPeople
 
PDF
Introduction To Groovy And Grails - SpringPeople
SpringPeople
 
PDF
Introduction To Jenkins - SpringPeople
SpringPeople
 
Growth hacking tips and tricks that you can try
SpringPeople
 
Top Big data Analytics tools: Emerging trends and Best practices
SpringPeople
 
Introduction to Big Data
SpringPeople
 
Introduction to Microsoft Azure IaaS
SpringPeople
 
Introduction to Selenium WebDriver
SpringPeople
 
Introduction to Open stack - An Overview
SpringPeople
 
Why 2 million Developers depend on MuleSoft
SpringPeople
 
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
SpringPeople
 
Mastering Test Automation: How To Use Selenium Successfully
SpringPeople
 
An Introduction of Big data; Big data for beginners; Overview of Big Data; Bi...
SpringPeople
 
SpringPeople - Introduction to Cloud Computing
SpringPeople
 
SpringPeople - Devops skills - Do you have what it takes?
SpringPeople
 
Elastic - ELK, Logstash & Kibana
SpringPeople
 
Hadoop data access layer v4.0
SpringPeople
 
Introduction To Core Java - SpringPeople
SpringPeople
 
Introduction To Hadoop Administration - SpringPeople
SpringPeople
 
Introduction To Cloud Foundry - SpringPeople
SpringPeople
 
Introduction To Spring Enterprise Integration - SpringPeople
SpringPeople
 
Introduction To Groovy And Grails - SpringPeople
SpringPeople
 
Introduction To Jenkins - SpringPeople
SpringPeople
 
Ad

Recently uploaded (20)

PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
July Patch Tuesday
Ivanti
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
July Patch Tuesday
Ivanti
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 

Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) 2.3 _Part 2

  • 1. Hadoop Admin Best Practices with HDP 2.3 Part-2
  • 2.  We have INSTRUCTOR LED - both Online LIVE & Classroom Session  Present for classroom sessions in Bangalore & Delhi (NCR)  We are the ONLY Education delivery partners for Mulesoft, Elastic, Pivotal & Lightbend in India  We have delivered more than 5000 trainings and have over 400 courses and a vast pool of over 200 experts to make YOU the EXPERT! FOLLOW US ON SOCIAL MEDIA TO STAY UPDATED ON THE UPCOMING WEBINARS
  • 3. Online and Classroom Training on Technology Courses at SpringPeople Certified Partners Non-Certified Courses …and many more …NOW
  • 5. The HDP 2.3 Platform Versions
  • 6. Covered Till Now 1. Use Ambari – Cluster Management Tool 2. More of WebHDFS Access 3. WebHDFS 4. Use More of HDFS Access Control Lists 5. Use HDFS Quotas 6. Understanding of YARN Components 7. Adding, Deleting, or Replacing Worker Nodes 8. Rack Awareness 9. NameNode High Availability 10. ResourceManager High Availability 11. Ambari Metrics System 12. What to Backup?
  • 7. 13 – Setting appropriate Directory Space Quota • Best practice is to also set space limits on home directory To set a 12TB limit: $ hdfs dfsadmin –setSpaceQuota 12t /user/username • Includes space for replication • This is the actual use of space • Example: • If storing 1TB and replication factor is 3 • 3TB is needed • Quota can be set on any directory
  • 8. 14 - Configuring Trash • Enable by setting time delay for trash's checkpoint removal: In core-site.xml • fs.trash.interval • Delay is set in minutes (24 hours would be 1440 minutes) • Recommendation is to set to 360 minutes (6 hours) • Setting the value to 0 disables Trash • Files deleted programmatically are deleted immediately • Files can be immediately deleted from the command line using -skipTrash
  • 9. 15 - Compression Needs and Tradeoffs  Compressing data can speed up data-intensive I/O operations • MapReduce jobs are almost always I/O bound  Compressed data can save storage space and speed up data transfers across the network • Capital allocation for hardware can go further  Reduced I/O and network load can result in significant performance improvements • MapReduce jobs can finish faster overall  But, CPU utilization and processing time increase during compression and decompression • Understanding the tradeoffs is important for MapReduce pipeline’s overall performance
  • 10. 16 - Sqoop Security • Database Authentication: • Sqoop needs to authenticate to the RDBMS • How? • Usually this involves a username/password (Oracle Wallet is the exception) • Can hard code password in scripts (not recommended/used) • Password usually stored in plaintext in a file protected by the filesystem • Hadoop Credential Management Framework added in HDP 2.2 • Not a keystore, but a way to interface with keystore backends • Passwords can be stored in a keystore and not in plain text • Can help with “no passwords in plaintext” requirements
  • 11. 17 - distcp Configurations • If Distcp runs out of memory before copying: • Possible Cause: Number of files/directories being copied from source path(s) is extremely large (e.g. 100,000 paths) • Change: HEAP Size - Export HADOOP_CLIENT_OPTS="-Xms64m -Xmx1024m” • Map Sizing • If -m is not specified: Default to 20 maps max • Tuning the number of maps to: - Size of the source and destination cluster - The size of the copy - Available bandwidth
  • 12. 18 - Falcon  Centrally manages data lifecycle • Centralized definition & management of pipelines for data ingest, process and export  Supports Business continuity and Disaster Recovery • Out of the box policies for data replication and retention • End-to-end monitoring of data pipelines  Addresses basic audit & compliance requirements • Visualize data pipeline lineage • Track data pipeline audit logs • Tag data with business metadata
  • 13. 19 - Running Balancer • Can be run periodically as a batch job • Examples: every 24 hours or weekly • Run after new nodes have been added to the cluster • To run balancer: hdfs balancer [-threshold <threshold>] [-policy <policy>]] • Runs until there are no blocks to move or Until it has lost contact with the NameNode • Can be stopped with a Ctrl+C
  • 14. 20 - HDFS Snapshots Create HDFS directory snapshots Fast operation - only metadata affected Results in .snapshot/ directory in the HDFS directory Snapshots are named or default to timestamp Directories must be made snapshottable Snapshot Steps: – Allow snapshot on directory hdfs dfsadmin -allowSnapshot foo/bar/ – Create snapshot for directory and optionally provide snapshot name hdfs dfs -createSnapshot foo/bar/ mysnapshot_today – Verify snapshot hdfs dfs -ls foo/bar/.snapshot
  • 15. 21 - HDFS Data – Automate & Restore • Use Falcon/Oozie to automate backups • Falcon utilizes Oozie as a workflow scheduler • distcp is an Oozie action - use -update and -prbugp • Restoring is the reverse process of backups 1. On your backup cluster choose which snapshot to restore 2. Remove/move target directory on production system 3. Run distcp without -update options
  • 16. 22 - Apache Ranger
  • 18. [email protected] Upcoming Hortonworks Classes at SpringPeople Classroom (Bengaluru) 05 - 08 Sept 26 - 28 Sept 10 - 13 Oct 07 - 10 Nov 05 - 08 Dec 19 - 21 Dec Online LIVE 22 - 31 Aug 05 - 17 Sept 19 Sept - 01 Oct

Editor's Notes

  • #2: 8/12/2016 2:39 PM
  • #5: Feel free to spend a lot of time on this slide. Many of these frameworks are not discussed later in the course, so now is likely your only chance to explain them. Let the students ask questions and make the discussion interactive.
  • #6: So what is Hortonworks Data Platform(HDP)? It is an open enterprise version of Hadoop distributed by Hortonworks. It includes a single installation utility that installs many of the Apache Hadoop software frameworks. Even the installer is pure Hadoop. The primary benefit is that Hortonworks has put HDP through a rigorous set of system, functional, and regression tests to ensure that versions of any frameworks included in the distribution work seamlessly together in a secure and reliable manner. Because HDP is an open enterprise version of Hadoop, it is imperative that it uses the best combination of the most stable, reliable, secure, and current frameworks.
  • #9: There is one more property which is having relation with the above property called fs.trash.checkpoint.interval. It is the number of minutes between trash checkpoints. This should be smaller or equal to fs.trash.interval. Everytime the checkpointer runs, it creates a new checkpoint out of current and removes checkpoints created more than fs.trash.interval minutes ago.The default value of this property is zero.