SlideShare a Scribd company logo
MapReduce and Hadoop
Cadenelli Nicola
Datenbanken Implementierungstechniken
Introduction
● History
● Motivations
MapReduce
● What MapReduce is
● Why it is usefull
● Execution Details
● Some Examples
● Conclusions
Outline
Hadoop
● Introduction
● Hadoop Architecture
● Hadoop Ecosystem
● In real world
MapReduce&Databases
● SQL-MapReduce
● In-Database Map-Reduce
● Conclusions
Introduction MapReduce Hadoop MR&Databases
● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
GFS
MapReduce
BigTable
HDFS
MapReduce
Introduction MapReduce Hadoop MR&Databases
○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
2004: Google
publishes the
papers
2006:
Apache releases
Hadoop.
Is the first Open
Source
implementation of
GFS and
MapReduce.
Now:
Amazon, AOL,
eBay, Facebook,
HP, IBM, Last.fm,
LinkedIn, Microsoft,
Spotify,
Twitter and more
are using Hadoop.
A Brief History
Introduction MapReduce Hadoop MR&Databases
○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
● Data start to be really big: more than >10TB.
E.g: Large Synoptic Survey Telescope (30TB / night)
● The best idea is to scale out (not scale up) the
system, but . . .
 How do we scale to more than 1000+ machines?
 How do we handle machine failures?
 How can we facilitate communications between nodes?
 If we change system, do we lose all our optimisation
work?
● Google needed to recreate the index of the web.
Motivations
Introduction MapReduce Hadoop MR&Databases
○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
“MapReduce is a programming model and an
associated implementation for processing and
generating large data sets.” – Google, Inc.
MapReduce paper, 2004.
It is a really simple API that has just two serial
functions, map() and reduce() and is language
independent (Java, Python, Perl …).
What is MapReduce?
Introduction MapReduce Hadoop MR&Databases
○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
MapReduce hides messy details in the runtime
library:
● Parallelization and Distribution
● Load balancing
● Network and disk transfer optimization
● Handling of machine failures
● Fault tolerance
● Monitoring & status updates
All users obtain benefits from improvements on the
core library.
Why is MapReduce useful?
Introduction MapReduce Hadoop MR&Databases
○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
1. Read a lot of data
2. Map: extract something we care about from each record
3. Shuffle and Sort
4. Reduce: aggregate, summarize, filter, or transform
5. Write the results
From an outside view is the same (read, elaborate,
write), map and reduce change to fit the problem.
Typical problem solved by MapReduce
Introduction MapReduce Hadoop MR&Databases
○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
● Single master controls job execution on multiple slaves.
● Mappers preferentially placed on same node or same
rack as their input block → minimizes network usage!!!
● Mappers save outputs to local disk before serving them
to reducers.
● If a map or reduce crashes: Re-execute!
● Allows having more mappers and reducers than nodes.
Some Execution Details
Introduction MapReduce Hadoop MR&Databases
○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
Execution overview
Google, Inc. MapReduce paper, 2004.
Introduction MapReduce Hadoop MR&Databases
○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
Programmer has to write two primary methods:
map (k1,v1) → list(k2,v2)
reduce (k2,list(v2)) → list(k2,v2)
● All v' with the same k' are reduced together, in
order.
● The input keys and values are drawn from a
different domain than the output keys and values.
MapReduce Programming Model
Introduction MapReduce Hadoop MR&Databases
○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
map(String key, String value):
// key: document name
// value: document contents
for each word w in value:
EmitIntermediate(w, "1");
reduce(String key, Iterator values):
// key: a word
// values: a list of counts
int result = 0;
for each v in values:
result += ParseInt(v);
Emit(AsString(result));
Example: Words Frequency
“documentx”, “To be or not to be”
“be”, 2
“not”, 1
“or”, 1
“to”, 2
Introduction MapReduce Hadoop MR&Databases
○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
“document1”,
“To be or not to be”
“be”, 2
“not”, 1
“or”, 1
“to”, 2
...
“to”, 1
“be”, 1
“or”, 1
“not”, 1
“to”, 1
“be”, 1
key = “be”
values = “1”,”1”
key = “not”
values = “1”
key = “or”
values = “1”
key = “to”
values = “1”,”1”
...“document2”,
“text”
...
...
“be”, 1
“be”, 1
...
“not”, 1
...
“or”, 1
...
“to”, 1
“to”, 1
...
ShuffleandSort:aggregatevaluesbykey
Map Reduce
Introduction MapReduce Hadoop MR&Databases
○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
● Inverted index
- Find what documents contain a specific word.
- Map: parse document, emit <word, document-ID> pairs.
- Reduce: for each word, sort the corresponding document Ids.
Emit <word, list(document-ID)>
• Reverse web-link graph
- Find where page links come from.
- Map: output <target, source> for each link to target in a page
source.
- Reduce: concatenate the list of all source URLs associated
with a target.
Emit <target, list(source)>
Others examples
Introduction MapReduce Hadoop MR&Databases
○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
● Proven to be a useful abstraction
● Really simplifies large-scala computations
● Fun to use:
- Focus on problem
- Let the library deal with messy details
Conclusions on MapReduce
Introduction MapReduce Hadoop MR&Databases
○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
GFS
MapReduce
HDFS
MapReduce
Introduction MapReduce Hadoop MR&Databases
○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
● Is a framework for distributed processing
● It is Open Source (Apache v2 Licence)
● It is a top-level Apache Project
● Written in Java
● Batch processing centric
● Runs on commodity hardware
What is Hadoop?
Introduction MapReduce Hadoop MR&Databases
○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
Introduction MapReduce Hadoop MR&Databases
○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
Hadoop Distributed File System
● For very large files: TBs, PBs.
● Each file is partitioned into chunks of 64MB.
● Each chunk is replicated several times (>=3), on
different racks, for fault tolerance.
● Is an abstract FS, disks are formatted on ext3, ext4
or XFS.
Introduction MapReduce Hadoop MR&Databases
○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
Hadoop Architecture
● TaskTracker is the MapReduce server
(processing part)
● DataNode is the HDFS server
(data part)
TaskTracker
DataNode
Machine
Hadoop Architecture - Master/Slave
TaskTracker
DataNode
JobTracker:
● Accepts users' jobs
● Assigns tasks to workers
● Keeps track of the jobs status
TaskTracker
DataNode
TaskTracker
DataNode
JobTracker
Introduction MapReduce Hadoop MR&Databases
○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
Hadoop Architecture - Master/Slave
TaskTracker
DataNode
NameNode:
● Keeps information on data location
● Decides where a file has to be written
TaskTracker
DataNode
TaskTracker
DataNode
NameNode
Data never flows trough the NameNode!
Introduction MapReduce Hadoop MR&Databases
○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
Hadoop Architecture – Scalable
TaskTracker
DataNode
Machine
● Having multiple machine with Hadoop creates a
cluster.
● What If we need more storage or compute power?
TaskTracker
DataNode
Machine
TaskTracker
DataNode
Machine
Introduction MapReduce Hadoop MR&Databases
○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
Hadoop Architecture - Overview
B C
Client JobTracker
NameNode
Secondary
NameNode A
File
Introduction MapReduce Hadoop MR&Databases
○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
Hadoop Ecosystem – Pig & Hive
MapReduce
HDFS
Pig Hive
Introduction MapReduce Hadoop MR&Databases
○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
Hadoop Ecosystem – HBase
MapReduce
HDFS
Pig Hive
HBase
Introduction MapReduce Hadoop MR&Databases
○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
@Google
● Index construction for Google Search
● Article clustering for Google News
● Statistical machine translation
@Yahoo! (4100 nodes)
● “Web map” powering Yahoo! Search
● Spam detection for Yahoo! Mail
@Facebook (>100 PB of storage)
● Data mining
● Ad optimization
● Spam detection
What is MapReduce/Hadoop used for?
Introduction MapReduce Hadoop MR&Databases
○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○
MapReduce's use of input files and lack of schema
support prevents the performance improvements
enabled by features like B-trees and hash
partitioning . . .
. . . most of the data in companies are stored on
databases!
but . . .
Introduction MapReduce Hadoop MR&Databases
○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○
● SQL-MapReduce by Teradata Aster
● In-Database Map-Reduce by Oracle
● Connectors to allow external Hadoop
programs to access data from databases
and to store Hadoop output in databases
Solutions
Introduction MapReduce Hadoop MR&Databases
○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○
Is a framework to allow developers to write SQL-
MapReduce functions in languages such as Java,
C#, Python and C++ and push them into the
database for advanced in-database analytics.
SQL-MapReduce
Introduction MapReduce Hadoop MR&Databases
○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○
MR functions can be used like custom SQL operators and
can implement any algorithm or transformation.
SQL-MapReduce - Syntax
http://www.asterdata.com/resources/mapreduce.php
Introduction MapReduce Hadoop MR&Databases
○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○
Demo #1: Map (Tokenization) and Reduce (WordCount) in SQL/MR
SELECT key AS word, value AS wordcount
FROM WordCountReduce (
ON Tokenize ( ON blogs )
PARTITION BY key
)
ORDER BY wordcount DESC
LIMIT 20;
Example: Words Frequency
Introduction MapReduce Hadoop MR&Databases
○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○
Demo #1: Map (Tokenization) and Reduce (WordCount) in SQL/MR
SELECT key AS word, value AS wordcount
FROM WordCountReduce (
ON Tokenize ( ON blogs )
PARTITION BY key
)
ORDER BY wordcount DESC
LIMIT 20;
Demo #2: Why do Reduce when we have SQL?
SELECT word, count(*) AS wordcount
FROM Tokenize ( ON blogs )
GROUP BY word
ORDER BY wordcount DESC
LIMIT 20;
Example: Words Frequency
Introduction MapReduce Hadoop MR&Databases
○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○
● Uses Table Functions to implement Map-Reduce within
the database.
● Parallelization is provided by the Oracle Parallel
Execution framework.
Using this in combination with SQL, Oracle provides an
simple mechanism for database developers to
develop Map-Reduce functionality using languages they
know.
In-Database Map-Reduce by Oracle
Introduction MapReduce Hadoop MR&Databases
○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○
SELECT *
FROM table(oracle_map_reduce.reducer(
cursor(
SELECT value(map_result).word word
FROM table(oracle_map_reduce.mapper(
cursor(
SELECT a FROM documents), ' '
)
)
map_result
)
));
Example: Words Frequency
Introduction MapReduce Hadoop MR&Databases
○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○
However this solutions are not source
compatible with Hadoop.
Native Hadoop programs need to be
rewritten before becoming usable in
databases.
Still not perfect!
Introduction MapReduce Hadoop MR&Databases
○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○
Questions?
Introduction MapReduce Hadoop MR&Databases
○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ●

More Related Content

What's hot (20)

PPTX
Analysing of big data using map reduce
Paladion Networks
 
PDF
Hadoop, HDFS and MapReduce
fvanvollenhoven
 
PDF
Map Reduce
Vigen Sahakyan
 
PDF
Hadoop scheduler with deadline constraint
ijccsa
 
PDF
Resource Aware Scheduling for Hadoop [Final Presentation]
Lu Wei
 
PDF
Mapreduce by examples
Andrea Iacono
 
PDF
Topic 6: MapReduce Applications
Zubair Nabi
 
PPTX
Introduction to MapReduce
Chicago Hadoop Users Group
 
PPTX
Map reduce and Hadoop on windows
Muhammad Shahid
 
PDF
Introduction to Map-Reduce
Brendan Tierney
 
PPTX
Introduction to MapReduce and Hadoop
Mohamed Elsaka
 
PDF
Enhancing Performance and Fault Tolerance of Hadoop Cluster
IRJET Journal
 
PPTX
Map Reduce Online
Hadoop User Group
 
PPT
Map Reduce
Sri Prasanna
 
PPTX
Hadoop fault tolerance
Pallav Jha
 
PPTX
Hadoop fault-tolerance
Ravindra Bandara
 
PPT
Apache hadoop, hdfs and map reduce Overview
Nisanth Simon
 
PDF
Apache Hadoop MapReduce Tutorial
Farzad Nozarian
 
PDF
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
Mahantesh Angadi
 
PPTX
Map reduce presentation
ateeq ateeq
 
Analysing of big data using map reduce
Paladion Networks
 
Hadoop, HDFS and MapReduce
fvanvollenhoven
 
Map Reduce
Vigen Sahakyan
 
Hadoop scheduler with deadline constraint
ijccsa
 
Resource Aware Scheduling for Hadoop [Final Presentation]
Lu Wei
 
Mapreduce by examples
Andrea Iacono
 
Topic 6: MapReduce Applications
Zubair Nabi
 
Introduction to MapReduce
Chicago Hadoop Users Group
 
Map reduce and Hadoop on windows
Muhammad Shahid
 
Introduction to Map-Reduce
Brendan Tierney
 
Introduction to MapReduce and Hadoop
Mohamed Elsaka
 
Enhancing Performance and Fault Tolerance of Hadoop Cluster
IRJET Journal
 
Map Reduce Online
Hadoop User Group
 
Map Reduce
Sri Prasanna
 
Hadoop fault tolerance
Pallav Jha
 
Hadoop fault-tolerance
Ravindra Bandara
 
Apache hadoop, hdfs and map reduce Overview
Nisanth Simon
 
Apache Hadoop MapReduce Tutorial
Farzad Nozarian
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
Mahantesh Angadi
 
Map reduce presentation
ateeq ateeq
 

Viewers also liked (18)

PDF
Smartphones' Security
Nicola Cadenelli
 
PPTX
UTE-CONCEPCION DEL HOMBRE Y CUESTIONAMIENTO SOBRE EL SER
Martha Isabel Lligüi Pauta
 
PPTX
Development
Robyn96
 
PPTX
Go green curb global warming
Mohammed Suhail
 
DOC
Lịch học Mầm non Hương Giang
Non Mầm
 
PDF
File_2013.12.
Ksenia Ishunina
 
PPT
Affordable Day care program to empower indian mothers adb 3ie conference
ifmrcmf
 
DOC
Bang bao gia web ok!
Non Mầm
 
PDF
Türkiye GEN Hareketi
Gelecek Hane
 
PDF
Ekonomi 2.0 Raporu
Gelecek Hane
 
PDF
Measures on Design Drawings
Gautam Shah
 
PDF
COIMOTION概念介紹
Ben Lue
 
PPT
Moby crm
mobilecrm
 
PPTX
English presentation
Ragadian S'
 
PPS
.
Xana Luca
 
PPTX
Maker Workshop 7 May 2014 - StudioX
Gelecek Hane
 
PDF
bang chu cai tieng nhat
khucxuanvuong-hut
 
PPTX
Hack & Go! Redefining API @ MOPCON 2014
Ben Lue
 
Smartphones' Security
Nicola Cadenelli
 
UTE-CONCEPCION DEL HOMBRE Y CUESTIONAMIENTO SOBRE EL SER
Martha Isabel Lligüi Pauta
 
Development
Robyn96
 
Go green curb global warming
Mohammed Suhail
 
Lịch học Mầm non Hương Giang
Non Mầm
 
File_2013.12.
Ksenia Ishunina
 
Affordable Day care program to empower indian mothers adb 3ie conference
ifmrcmf
 
Bang bao gia web ok!
Non Mầm
 
Türkiye GEN Hareketi
Gelecek Hane
 
Ekonomi 2.0 Raporu
Gelecek Hane
 
Measures on Design Drawings
Gautam Shah
 
COIMOTION概念介紹
Ben Lue
 
Moby crm
mobilecrm
 
English presentation
Ragadian S'
 
Maker Workshop 7 May 2014 - StudioX
Gelecek Hane
 
bang chu cai tieng nhat
khucxuanvuong-hut
 
Hack & Go! Redefining API @ MOPCON 2014
Ben Lue
 
Ad

Similar to MapReduce and Hadoop (20)

PDF
Understanding Hadoop
Ahmed Ossama
 
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
ODP
Web-scale data processing: practical approaches for low-latency and batch
Edward Capriolo
 
PDF
Big Data Architecture and Deployment
Cisco Canada
 
PDF
Cisco connect toronto 2015 big data sean mc keown
Cisco Canada
 
PDF
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
soujavajug
 
PDF
New Analytics Toolbox DevNexus 2015
Robbie Strickland
 
ODP
A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
Kumari Surabhi
 
PDF
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Ganesh Raju
 
PDF
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
Linaro
 
PDF
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
Linaro
 
PDF
[@NaukriEngineering] Apache Spark
Naukri.com
 
PPTX
Apache Hive for modern DBAs
Luis Marques
 
PDF
Architecting and productionising data science applications at scale
samthemonad
 
PPTX
Introduction to Spark - Phoenix Meetup 08-19-2014
cdmaxime
 
PPTX
Apache spark - History and market overview
Martin Zapletal
 
PDF
Introduction To Apache Pig at WHUG
Adam Kawa
 
PPTX
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
cdmaxime
 
PPTX
AWS Big Data Demystified #1: Big data architecture lessons learned
Omid Vahdaty
 
PDF
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dan Lynn
 
Understanding Hadoop
Ahmed Ossama
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Web-scale data processing: practical approaches for low-latency and batch
Edward Capriolo
 
Big Data Architecture and Deployment
Cisco Canada
 
Cisco connect toronto 2015 big data sean mc keown
Cisco Canada
 
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
soujavajug
 
New Analytics Toolbox DevNexus 2015
Robbie Strickland
 
A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
Kumari Surabhi
 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Ganesh Raju
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
Linaro
 
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
Linaro
 
[@NaukriEngineering] Apache Spark
Naukri.com
 
Apache Hive for modern DBAs
Luis Marques
 
Architecting and productionising data science applications at scale
samthemonad
 
Introduction to Spark - Phoenix Meetup 08-19-2014
cdmaxime
 
Apache spark - History and market overview
Martin Zapletal
 
Introduction To Apache Pig at WHUG
Adam Kawa
 
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
cdmaxime
 
AWS Big Data Demystified #1: Big data architecture lessons learned
Omid Vahdaty
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dan Lynn
 
Ad

Recently uploaded (20)

PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
Biography of Daniel Podor.pdf
Daniel Podor
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Biography of Daniel Podor.pdf
Daniel Podor
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 

MapReduce and Hadoop

  • 1. MapReduce and Hadoop Cadenelli Nicola Datenbanken Implementierungstechniken
  • 2. Introduction ● History ● Motivations MapReduce ● What MapReduce is ● Why it is usefull ● Execution Details ● Some Examples ● Conclusions Outline Hadoop ● Introduction ● Hadoop Architecture ● Hadoop Ecosystem ● In real world MapReduce&Databases ● SQL-MapReduce ● In-Database Map-Reduce ● Conclusions Introduction MapReduce Hadoop MR&Databases ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
  • 3. GFS MapReduce BigTable HDFS MapReduce Introduction MapReduce Hadoop MR&Databases ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
  • 4. 2004: Google publishes the papers 2006: Apache releases Hadoop. Is the first Open Source implementation of GFS and MapReduce. Now: Amazon, AOL, eBay, Facebook, HP, IBM, Last.fm, LinkedIn, Microsoft, Spotify, Twitter and more are using Hadoop. A Brief History Introduction MapReduce Hadoop MR&Databases ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
  • 5. ● Data start to be really big: more than >10TB. E.g: Large Synoptic Survey Telescope (30TB / night) ● The best idea is to scale out (not scale up) the system, but . . .  How do we scale to more than 1000+ machines?  How do we handle machine failures?  How can we facilitate communications between nodes?  If we change system, do we lose all our optimisation work? ● Google needed to recreate the index of the web. Motivations Introduction MapReduce Hadoop MR&Databases ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
  • 6. “MapReduce is a programming model and an associated implementation for processing and generating large data sets.” – Google, Inc. MapReduce paper, 2004. It is a really simple API that has just two serial functions, map() and reduce() and is language independent (Java, Python, Perl …). What is MapReduce? Introduction MapReduce Hadoop MR&Databases ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
  • 7. MapReduce hides messy details in the runtime library: ● Parallelization and Distribution ● Load balancing ● Network and disk transfer optimization ● Handling of machine failures ● Fault tolerance ● Monitoring & status updates All users obtain benefits from improvements on the core library. Why is MapReduce useful? Introduction MapReduce Hadoop MR&Databases ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
  • 8. 1. Read a lot of data 2. Map: extract something we care about from each record 3. Shuffle and Sort 4. Reduce: aggregate, summarize, filter, or transform 5. Write the results From an outside view is the same (read, elaborate, write), map and reduce change to fit the problem. Typical problem solved by MapReduce Introduction MapReduce Hadoop MR&Databases ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
  • 9. ● Single master controls job execution on multiple slaves. ● Mappers preferentially placed on same node or same rack as their input block → minimizes network usage!!! ● Mappers save outputs to local disk before serving them to reducers. ● If a map or reduce crashes: Re-execute! ● Allows having more mappers and reducers than nodes. Some Execution Details Introduction MapReduce Hadoop MR&Databases ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
  • 10. Execution overview Google, Inc. MapReduce paper, 2004. Introduction MapReduce Hadoop MR&Databases ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
  • 11. Programmer has to write two primary methods: map (k1,v1) → list(k2,v2) reduce (k2,list(v2)) → list(k2,v2) ● All v' with the same k' are reduced together, in order. ● The input keys and values are drawn from a different domain than the output keys and values. MapReduce Programming Model Introduction MapReduce Hadoop MR&Databases ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
  • 12. map(String key, String value): // key: document name // value: document contents for each word w in value: EmitIntermediate(w, "1"); reduce(String key, Iterator values): // key: a word // values: a list of counts int result = 0; for each v in values: result += ParseInt(v); Emit(AsString(result)); Example: Words Frequency “documentx”, “To be or not to be” “be”, 2 “not”, 1 “or”, 1 “to”, 2 Introduction MapReduce Hadoop MR&Databases ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
  • 13. “document1”, “To be or not to be” “be”, 2 “not”, 1 “or”, 1 “to”, 2 ... “to”, 1 “be”, 1 “or”, 1 “not”, 1 “to”, 1 “be”, 1 key = “be” values = “1”,”1” key = “not” values = “1” key = “or” values = “1” key = “to” values = “1”,”1” ...“document2”, “text” ... ... “be”, 1 “be”, 1 ... “not”, 1 ... “or”, 1 ... “to”, 1 “to”, 1 ... ShuffleandSort:aggregatevaluesbykey Map Reduce Introduction MapReduce Hadoop MR&Databases ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
  • 14. ● Inverted index - Find what documents contain a specific word. - Map: parse document, emit <word, document-ID> pairs. - Reduce: for each word, sort the corresponding document Ids. Emit <word, list(document-ID)> • Reverse web-link graph - Find where page links come from. - Map: output <target, source> for each link to target in a page source. - Reduce: concatenate the list of all source URLs associated with a target. Emit <target, list(source)> Others examples Introduction MapReduce Hadoop MR&Databases ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
  • 15. ● Proven to be a useful abstraction ● Really simplifies large-scala computations ● Fun to use: - Focus on problem - Let the library deal with messy details Conclusions on MapReduce Introduction MapReduce Hadoop MR&Databases ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
  • 16. GFS MapReduce HDFS MapReduce Introduction MapReduce Hadoop MR&Databases ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
  • 17. ● Is a framework for distributed processing ● It is Open Source (Apache v2 Licence) ● It is a top-level Apache Project ● Written in Java ● Batch processing centric ● Runs on commodity hardware What is Hadoop? Introduction MapReduce Hadoop MR&Databases ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
  • 18. Introduction MapReduce Hadoop MR&Databases ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ Hadoop Distributed File System ● For very large files: TBs, PBs. ● Each file is partitioned into chunks of 64MB. ● Each chunk is replicated several times (>=3), on different racks, for fault tolerance. ● Is an abstract FS, disks are formatted on ext3, ext4 or XFS.
  • 19. Introduction MapReduce Hadoop MR&Databases ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ Hadoop Architecture ● TaskTracker is the MapReduce server (processing part) ● DataNode is the HDFS server (data part) TaskTracker DataNode Machine
  • 20. Hadoop Architecture - Master/Slave TaskTracker DataNode JobTracker: ● Accepts users' jobs ● Assigns tasks to workers ● Keeps track of the jobs status TaskTracker DataNode TaskTracker DataNode JobTracker Introduction MapReduce Hadoop MR&Databases ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
  • 21. Hadoop Architecture - Master/Slave TaskTracker DataNode NameNode: ● Keeps information on data location ● Decides where a file has to be written TaskTracker DataNode TaskTracker DataNode NameNode Data never flows trough the NameNode! Introduction MapReduce Hadoop MR&Databases ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
  • 22. Hadoop Architecture – Scalable TaskTracker DataNode Machine ● Having multiple machine with Hadoop creates a cluster. ● What If we need more storage or compute power? TaskTracker DataNode Machine TaskTracker DataNode Machine Introduction MapReduce Hadoop MR&Databases ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
  • 23. Hadoop Architecture - Overview B C Client JobTracker NameNode Secondary NameNode A File Introduction MapReduce Hadoop MR&Databases ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
  • 24. Hadoop Ecosystem – Pig & Hive MapReduce HDFS Pig Hive Introduction MapReduce Hadoop MR&Databases ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
  • 25. Hadoop Ecosystem – HBase MapReduce HDFS Pig Hive HBase Introduction MapReduce Hadoop MR&Databases ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
  • 26. @Google ● Index construction for Google Search ● Article clustering for Google News ● Statistical machine translation @Yahoo! (4100 nodes) ● “Web map” powering Yahoo! Search ● Spam detection for Yahoo! Mail @Facebook (>100 PB of storage) ● Data mining ● Ad optimization ● Spam detection What is MapReduce/Hadoop used for? Introduction MapReduce Hadoop MR&Databases ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○ ○
  • 27. MapReduce's use of input files and lack of schema support prevents the performance improvements enabled by features like B-trees and hash partitioning . . . . . . most of the data in companies are stored on databases! but . . . Introduction MapReduce Hadoop MR&Databases ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○ ○
  • 28. ● SQL-MapReduce by Teradata Aster ● In-Database Map-Reduce by Oracle ● Connectors to allow external Hadoop programs to access data from databases and to store Hadoop output in databases Solutions Introduction MapReduce Hadoop MR&Databases ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○ ○
  • 29. Is a framework to allow developers to write SQL- MapReduce functions in languages such as Java, C#, Python and C++ and push them into the database for advanced in-database analytics. SQL-MapReduce Introduction MapReduce Hadoop MR&Databases ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○ ○
  • 30. MR functions can be used like custom SQL operators and can implement any algorithm or transformation. SQL-MapReduce - Syntax http://www.asterdata.com/resources/mapreduce.php Introduction MapReduce Hadoop MR&Databases ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○ ○
  • 31. Demo #1: Map (Tokenization) and Reduce (WordCount) in SQL/MR SELECT key AS word, value AS wordcount FROM WordCountReduce ( ON Tokenize ( ON blogs ) PARTITION BY key ) ORDER BY wordcount DESC LIMIT 20; Example: Words Frequency Introduction MapReduce Hadoop MR&Databases ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○
  • 32. Demo #1: Map (Tokenization) and Reduce (WordCount) in SQL/MR SELECT key AS word, value AS wordcount FROM WordCountReduce ( ON Tokenize ( ON blogs ) PARTITION BY key ) ORDER BY wordcount DESC LIMIT 20; Demo #2: Why do Reduce when we have SQL? SELECT word, count(*) AS wordcount FROM Tokenize ( ON blogs ) GROUP BY word ORDER BY wordcount DESC LIMIT 20; Example: Words Frequency Introduction MapReduce Hadoop MR&Databases ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○ ○
  • 33. ● Uses Table Functions to implement Map-Reduce within the database. ● Parallelization is provided by the Oracle Parallel Execution framework. Using this in combination with SQL, Oracle provides an simple mechanism for database developers to develop Map-Reduce functionality using languages they know. In-Database Map-Reduce by Oracle Introduction MapReduce Hadoop MR&Databases ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○ ○
  • 34. SELECT * FROM table(oracle_map_reduce.reducer( cursor( SELECT value(map_result).word word FROM table(oracle_map_reduce.mapper( cursor( SELECT a FROM documents), ' ' ) ) map_result ) )); Example: Words Frequency Introduction MapReduce Hadoop MR&Databases ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○ ○
  • 35. However this solutions are not source compatible with Hadoop. Native Hadoop programs need to be rewritten before becoming usable in databases. Still not perfect! Introduction MapReduce Hadoop MR&Databases ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ● ○
  • 36. Questions? Introduction MapReduce Hadoop MR&Databases ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ●