SlideShare a Scribd company logo
A part of the Nordic IT group EVRY
Infopulse
Oleksiy Krotov (Expert Oracle DBA)
19.01.2016
BIG DATA: Apache Hadoop
BIG DATA: Apache Hadoop 2
Apache Hadoop
HADOOP ARCHITECTURE
HADOOP INTERFACE
HADOOP DISTRIBUTED FILE SYSTEM (HDFS)
HADOOP MAPREDUCE
ORACLE BIG DATA
RESOURCES
Hadoop Architecture
Apache Hadoop is an open-source framework for distributed storage and
distributed processing of very large data sets
storage part known as Hadoop Distributed File System (HDFS)
processing part called MapReduce.
Hadoop splits files into large blocks and distributes them across nodes in
a cluster. To process data, Hadoop transfers packaged code for nodes to
process in parallel based on the data that needs to be processed.
Hadoop Architecture
Biggest Hadoop cluster: Yahoo! has more than 100,000 CPUs in over
40,000 servers running Hadoop, with its biggest Hadoop cluster
running 4,500 nodes with 455 PetaBytes of data in Hadoop (2014)
More than half of the Fortune 50 companies run open source Apache
Hadoop based on Cloudera. (2012)
The HDFS file system is not restricted to MapReduce jobs. It can be
used for other applications, many of which are under development at
Apache. The list includes the HBase database, the Apache Mahout
machine learning system, and the Apache Hive Data Warehouse
system. Hadoop can in theory be used for any sort of work that is
batch-oriented rather than real-time, is very data-intensive, and
benefits from parallel processing of data.
Hadoop Architecture
NameNode hosts metadata (file system index of files and blocks)
DataNode hosts the data (blocks)
JobTracker is a master which creates and runs the job
Hadoop Interface
[training@localhost ~]$ hdfs dfsadmin -report
Configured Capacity: 15118729216 (14.08 GB)
Present Capacity: 10163642368 (9.47 GB)
DFS Remaining: 9228095488 (8.59 GB)
DFS Used: 935546880 (892.21 MB)
DFS Used%: 9.2%
Under replicated blocks: 3
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)
Live datanodes:
Name: 127.0.0.1:50010 (localhost.localdomain)
Hostname: localhost.localdomain
Decommission Status : Normal
Configured Capacity: 15118729216 (14.08 GB)
DFS Used: 935546880 (892.21 MB)
Non DFS Used: 4955086848 (4.61 GB)
DFS Remaining: 9228095488 (8.59 GB)
DFS Used%: 6.19%
DFS Remaining%: 61.04%
Last contact: Mon Jan 18 14:05:48 EST 2016
Hadoop Interface
[training@localhost ~]$ hadoop fs -help get
-get [-ignoreCrc] [-crc] <src> ... <localdst>: Copy files that match the file pattern <src>
to the local name. <src> is kept. When copying multiple,
files, the destination must be a directory.
hadoop fs –ls
hadoop fs -put purchases.txt
hadoop fs -put access_log
hadoop fs -ls
hadoop fs -tail purchases.txt
hadoop fs get filename
hs {mapper script} {reducer script} {input_file} {output directory}
hs mapper.py reducer.py myinput joboutput
Hadoop Interface
Hadoop Interface
Hadoop Distributed File System (HDFS)
HDFS is a Java-based file system that provides scalable and
reliable data storage, and it was designed to span large
clusters of commodity servers.
HDFS is a scalable, fault-tolerant, distributed storage system
that works closely with a wide variety of concurrent data
access applications
Hadoop Distributed File System (HDFS)
Hadoop Distributed File System (HDFS)
Hadoop Distributed File System (HDFS)
Default replication value 3, data is stored on three nodes:
two on the same rack, and one on a different rack.
Data nodes can talk to each other to rebalance data, to
move copies around, and to keep the replication of data
high
Apache Hadoop can work with additional file systems:
FTP, Amazon S3, Windows Azure Storage Blobs (WASB)
Hadoop MapReduce
Hadoop MapReduce is a software framework for easily
writing applications which process vast amounts of
data (multi-terabyte data-sets) in-parallel on large
clusters (thousands of nodes) of commodity hardware
in a reliable, fault-tolerant manner.
A MapReduce job usually splits the input data-set into
independent chunks which are processed by the map
tasks in a completely parallel manner. The framework
sorts the outputs of the maps, which are then input to
the reduce tasks.
Hadoop MapReduce
Hadoop MapReduce
Hadoop MapReduce
Hadoop MapReduce
Hadoop MapReduce
Hadoop MapReduce
Hadoop MapReduce
Usage: $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar [options]
Options:
-input <path> DFS input file(s) for the Map step
-output <path> DFS output directory for the Reduce step
-mapper <cmd|JavaClassName> The streaming command to run
-combiner <cmd|JavaClassName> The streaming command to run
-reducer <cmd|JavaClassName> The streaming command to run
-file <file> File/dir to be shipped in the Job jar file
-inputformat TextInputFormat(default)|SequenceFileAsTextInputFormat|JavaClassName Optional.
-outputformat TextOutputFormat(default)|JavaClassName Optional.
-partitioner JavaClassName Optional.
-numReduceTasks <num> Optional.
-inputreader <spec> Optional.
-cmdenv <n>=<v> Optional. Pass env.var to streaming commands
-mapdebug <path> Optional. To run this script when a map task fails
-reducedebug <path> Optional. To run this script when a reduce task fails
-io <identifier> Optional.
-verbose
hs {mapper script} {reducer script} {input_file} {output directory}
hs mapper.py reducer.py myinput joboutput
Oracle Big Data Connectors
Load Data into the Database
Oracle Loader for Hadoop
– Map Reduce job transforms data on Hadoop
into Oracle-ready data types
– Use more Hadoop compute resources
Oracle SQL Connector for HDFS
– Oracle SQL access to data on Hadoop via
external tables
– Use more database compute resources
– Includes option to query in-place
Oracle Big Data Connectors
Load Data into the Database
Oracle Loader for Hadoop
– Map Reduce job transforms data on Hadoop
into Oracle-ready data types
– Use more Hadoop compute resources
Oracle SQL Connector for HDFS
– Oracle SQL access to data on Hadoop via
external tables
– Use more database compute resources
– Includes option to query in-place
Oracle Big Data Connectors
Oracle Big Data Appliance X5-2
Enterprise-class security for Hadoop through Oracle Big Data SQL,
which also provides the ability to use a simple SQL query to quickly
explore data across Hadoop, SQL, and relational databases.
Resources
https://hadoop.apache.org/docs/stable/
https://en.wikipedia.org/wiki/Apache_Hadoop
https://developer.yahoo.com/hadoop/tutorial/
http://go.cloudera.com/udacity-lesson-1
http://content.udacity-data.com/courses/ud617/access_log.gz
http://content.udacity-data.com/courses/ud617/purchases.txt.gz
https://www.youtube.com/watch?v=acWtid-OOWM
http://www.oracle.com/technetwork/database/bigdata-
appliance/overview/index.html
https://www.udacity.com/courses/ud617
Thank you for attention!
BIG DATA: Apache Hadoop 27
BIG DATA: Apache Hadoop 28
Contact us!
Address:
03056,
24, Polyova Str.,
Kyiv, Ukraine
Phone:
+38 044 457-88-56
Email:
info@infopulse.com.ua
Contact us!
Address:
03056,
24, Polyova Str.,
Kyiv, Ukraine
Phone:
+38 044 457-88-56
Email:
info@infopulse.com.ua
BIG DATA: Apache Hadoop 29

More Related Content

What's hot (20)

PPT
Hadoop Technologies
Kannappan Sirchabesan
 
PPT
Hadoop
Cassell Hsu
 
PPT
Meethadoop
IIIT-H
 
PPSX
Hadoop
Nishant Gandhi
 
PPTX
Introduction to Hadoop
Ran Ziv
 
PPT
Hadoop Tutorial
awesomesos
 
PPT
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hari Shankar Sreekumar
 
PPTX
Hadoop architecture by ajay
Hadoop online training
 
PPTX
6.hive
Prashant Gupta
 
PDF
Hadoop-Introduction
Sandeep Deshmukh
 
PDF
Introduction to Hadoop
Ovidiu Dimulescu
 
PDF
Hadoop installation, Configuration, and Mapreduce program
Praveen Kumar Donta
 
PPTX
Pptx present
Nitish Bhardwaj
 
PDF
Apache Hadoop and HBase
Cloudera, Inc.
 
PDF
Introduction to Big Data & Hadoop
Edureka!
 
PPTX
Hive and data analysis using pandas
Purna Chander K
 
PPT
Hadoop - Introduction to Hadoop
Vibrant Technologies & Computers
 
PPTX
Introduction to HDFS
Bhavesh Padharia
 
PPTX
SQLRally Amsterdam 2013 - Hadoop
Jan Pieter Posthuma
 
PPTX
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Simplilearn
 
Hadoop Technologies
Kannappan Sirchabesan
 
Hadoop
Cassell Hsu
 
Meethadoop
IIIT-H
 
Introduction to Hadoop
Ran Ziv
 
Hadoop Tutorial
awesomesos
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hari Shankar Sreekumar
 
Hadoop architecture by ajay
Hadoop online training
 
Hadoop-Introduction
Sandeep Deshmukh
 
Introduction to Hadoop
Ovidiu Dimulescu
 
Hadoop installation, Configuration, and Mapreduce program
Praveen Kumar Donta
 
Pptx present
Nitish Bhardwaj
 
Apache Hadoop and HBase
Cloudera, Inc.
 
Introduction to Big Data & Hadoop
Edureka!
 
Hive and data analysis using pandas
Purna Chander K
 
Hadoop - Introduction to Hadoop
Vibrant Technologies & Computers
 
Introduction to HDFS
Bhavesh Padharia
 
SQLRally Amsterdam 2013 - Hadoop
Jan Pieter Posthuma
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Simplilearn
 

Viewers also liked (11)

PPTX
Oracle's BigData solutions
Swiss Big Data User Group
 
PDF
Big data-analytics-ebook
Shubhashish Biswas
 
PDF
Big Data: Myths and Realities
Toronto-Oracle-Users-Group
 
PPTX
Oracle big data appliance and solutions
solarisyougood
 
PPTX
A Brief History of Big Data
Bernard Marr
 
PDF
Oracle NoSQL Database release 3.0 overview
Dave Segleau
 
PPTX
What is big data?
David Wellman
 
PPTX
Big data ppt
Thirunavukkarasu Ps
 
PPTX
What is Big Data?
Bernard Marr
 
PPTX
Big data ppt
Nasrin Hussain
 
PPTX
Big Data - 25 Amazing Facts Everyone Should Know
Bernard Marr
 
Oracle's BigData solutions
Swiss Big Data User Group
 
Big data-analytics-ebook
Shubhashish Biswas
 
Big Data: Myths and Realities
Toronto-Oracle-Users-Group
 
Oracle big data appliance and solutions
solarisyougood
 
A Brief History of Big Data
Bernard Marr
 
Oracle NoSQL Database release 3.0 overview
Dave Segleau
 
What is big data?
David Wellman
 
Big data ppt
Thirunavukkarasu Ps
 
What is Big Data?
Bernard Marr
 
Big data ppt
Nasrin Hussain
 
Big Data - 25 Amazing Facts Everyone Should Know
Bernard Marr
 
Ad

Similar to BIG DATA: Apache Hadoop (20)

PPTX
Hadoop introduction
Chirag Ahuja
 
PPTX
Hadoop vs Apache Spark
ALTEN Calsoft Labs
 
PPTX
Hadoop_arunam_ppt
jerrin joseph
 
PDF
Big data overview of apache hadoop
veeracynixit
 
PDF
Big data overview of apache hadoop
veeracynixit
 
PPTX
Distributed Systems Hadoop.pptx
Uttara University
 
PDF
2.1-HADOOP.pdf
MarianJRuben
 
PPTX
Cppt
chunkypandey12
 
PPTX
Cppt
chunkypandey12
 
PPTX
Cppt Hadoop
chunkypandey12
 
PPTX
THE SOLUTION FOR BIG DATA
Tarak Tar
 
PPTX
THE SOLUTION FOR BIG DATA
Tarak Tar
 
PPTX
Big Data and Hadoop Guide
Simplilearn
 
PDF
Hadoop overview.pdf
Sunil D Patil
 
PPTX
Introduction to Hadoop
Sudarshan Pant
 
PDF
20131205 hadoop-hdfs-map reduce-introduction
Xuan-Chao Huang
 
DOCX
project report on hadoop
Manoj Jangalva
 
PPTX
Topic 9a-Hadoop Storage- HDFS.pptx
DanishMahmood23
 
PDF
big data hadoop technonolgy for storing and processing data
preetik9044
 
Hadoop introduction
Chirag Ahuja
 
Hadoop vs Apache Spark
ALTEN Calsoft Labs
 
Hadoop_arunam_ppt
jerrin joseph
 
Big data overview of apache hadoop
veeracynixit
 
Big data overview of apache hadoop
veeracynixit
 
Distributed Systems Hadoop.pptx
Uttara University
 
2.1-HADOOP.pdf
MarianJRuben
 
Cppt Hadoop
chunkypandey12
 
THE SOLUTION FOR BIG DATA
Tarak Tar
 
THE SOLUTION FOR BIG DATA
Tarak Tar
 
Big Data and Hadoop Guide
Simplilearn
 
Hadoop overview.pdf
Sunil D Patil
 
Introduction to Hadoop
Sudarshan Pant
 
20131205 hadoop-hdfs-map reduce-introduction
Xuan-Chao Huang
 
project report on hadoop
Manoj Jangalva
 
Topic 9a-Hadoop Storage- HDFS.pptx
DanishMahmood23
 
big data hadoop technonolgy for storing and processing data
preetik9044
 
Ad

Recently uploaded (20)

PPTX
Great-Books. Powerpoint presentation. files
tamayocrisgie
 
DOCX
How Digital Marketplaces are Empowering Emerging MedTech Brands
Ram Gopal Varma
 
PDF
Buy Verified Payoneer Accounts — The Ultimate Guide for 2025 (Rank #1 on Goog...
Buy Verified Cash App Accounts
 
PPTX
presentation on legal and regulatory action
raoharsh4122001
 
PDF
Planning the parliament of the future in greece – considerations for a data-d...
Dr. Fotios Fitsilis
 
PPTX
STURGEON BAY WI AG PPT JULY 6 2025.pptx
FamilyWorshipCenterD
 
PDF
From Draft to DSN - How to Get your Paper In [DSN 2025 Doctoral Forum Keynote]
vschiavoni
 
PDF
Buy Verified Coinbase Accounts — The Ultimate Guide for 2025 (Rank #1 on Goog...
Buy Verified Cash App Accounts
 
PPTX
Unit 1, 2 & 3 - Pharmacognosy - Defn_history_scope.pptx
bagewadivarsha2024
 
PDF
Jotform Presentation Agents: Features and Benefits
Jotform
 
PDF
The Family Secret (essence of loveliness)
Favour Biodun
 
PPTX
INTRO-TO-EMPOWERMENT-TECHNOLGY grade 11 lesson
ReyAcosta8
 
DOC
STABILITY INDICATING METHOD DEVELOPMENT AND VALIDATION FOR SIMULTANEOUS ESTIM...
jmkeans624
 
PPTX
Melbourne_Keynote_June_19_2013_without_photos.pptx
BryInfanteRayos
 
PDF
The Origin - A Simple Presentation on any project
RishabhDwivedi43
 
PPTX
Lesson 1-3(Learners' copy).pptxucspctopi
KrizeAnneCorneja
 
PDF
The Impact of Game Live Streaming on In-Game Purchases of Chinese Young Game ...
Shibaura Institute of Technology
 
PDF
Model Project Report_36DR_G&P.pdf for investors understanding
MeetAgrawal23
 
PDF
Jotform Presentation Agents: Use Cases and Examples
Jotform
 
PDF
Committee-Skills-Handbook---MUNprep.org.pdf
SatvikAgarwal9
 
Great-Books. Powerpoint presentation. files
tamayocrisgie
 
How Digital Marketplaces are Empowering Emerging MedTech Brands
Ram Gopal Varma
 
Buy Verified Payoneer Accounts — The Ultimate Guide for 2025 (Rank #1 on Goog...
Buy Verified Cash App Accounts
 
presentation on legal and regulatory action
raoharsh4122001
 
Planning the parliament of the future in greece – considerations for a data-d...
Dr. Fotios Fitsilis
 
STURGEON BAY WI AG PPT JULY 6 2025.pptx
FamilyWorshipCenterD
 
From Draft to DSN - How to Get your Paper In [DSN 2025 Doctoral Forum Keynote]
vschiavoni
 
Buy Verified Coinbase Accounts — The Ultimate Guide for 2025 (Rank #1 on Goog...
Buy Verified Cash App Accounts
 
Unit 1, 2 & 3 - Pharmacognosy - Defn_history_scope.pptx
bagewadivarsha2024
 
Jotform Presentation Agents: Features and Benefits
Jotform
 
The Family Secret (essence of loveliness)
Favour Biodun
 
INTRO-TO-EMPOWERMENT-TECHNOLGY grade 11 lesson
ReyAcosta8
 
STABILITY INDICATING METHOD DEVELOPMENT AND VALIDATION FOR SIMULTANEOUS ESTIM...
jmkeans624
 
Melbourne_Keynote_June_19_2013_without_photos.pptx
BryInfanteRayos
 
The Origin - A Simple Presentation on any project
RishabhDwivedi43
 
Lesson 1-3(Learners' copy).pptxucspctopi
KrizeAnneCorneja
 
The Impact of Game Live Streaming on In-Game Purchases of Chinese Young Game ...
Shibaura Institute of Technology
 
Model Project Report_36DR_G&P.pdf for investors understanding
MeetAgrawal23
 
Jotform Presentation Agents: Use Cases and Examples
Jotform
 
Committee-Skills-Handbook---MUNprep.org.pdf
SatvikAgarwal9
 

BIG DATA: Apache Hadoop