Hdfs internals

HDFS Internals
Bhupesh Chawda
bhupesh@apache.org
DataTorrent

Image Source: https://help.marklogic.com/news/list/Index/10
Agenda

What are Blocks?
● A physical storage disk has a block size - minimum amount of data it can
read or write. Normally 512 bytes.
● File systems for a single disk also deal with data in blocks. Normally few
kilo bytes (4 kb).
● Hadoop has a much larger block size. By default it is 64 mb.
● Files in HDFS are broken down into block sized chunks and are stored as
independent units.
● However, files smaller than a block size do not occupy the entire block.
○ Should I care?

Why so large blocks?
● Minimize disk seek times
● Assuming 10 ms of seek time, and 100 MB/s as disk transfer rate, if block
size if 100 MB, then seek time is 1% of transfer time which is small enough
to ignore.
● Hence default is 64 MB while many production environments also use 128
MB.

HDFS Architecture
Image Source: https://hadoop.apache.org

Namenode and Datanode
● Master - Namenode
○ Manages file system namespace
○ File system tree and metadata for all files and directories
○ Stores this info in -
■ Namespace image
■ Edit log
○ Knows for a given file which datanodes has the corresponding blocks. Reconstructed at
startup
● Worker - Datanode
○ Store and retrieve blocks as requested by clients
○ Periodically report back to the namenode on the list of blocks they are storing

HDFS Storage
Image Source: https://developer.yahoo.com/hadoop/tutorial/module2.html

Secondary Namenode
Image Source: http://www.quickmeme.com/meme/35ke38

Secondary Namenode
● Not a backup namenode
● Periodically merge the namespace image with the edit log, if edit log
becomes too large
● Usually runs on a different machine than the namenode
● The secondary however always lags behind primary and hence the
merged copy cannot be used in case of primary failure
● In event of primary failure, copy the primary namespace image to the
secondary and run it as the new primary.

Writing a File in HDFS
Image Source: Hadoop The definitive guide, 4th edition

Reading a file in HDFS

HDFS Block Placement

Small File Problem?
Each file occupies namespace irrespective of file size!!
Image Source: http://www.bodhtree.com/blog/2012/09/28/hadoop-how-to-manage-huge-numbers-of-small-files-in-hdfs/

Further Reading
HDFS Comics :-)
https://docs.google.com/open?id=0B-zw6KHOtbT4MmRkZWJjYzEtYjI3Ni00NTFjLW
E0OGItYTU5OGMxYjc0N2M1

Thank You!!
Please send your questions at:
bhupesh@apache.org

Hdfs internals

More Related Content

What's hot (20)

Similar to Hdfs internals (20)

Recently uploaded (20)

Hdfs internals