Linux搭建Hadoop开发环境资源-CSDN下载

源码

需积分: 50 161 浏览量 2019-04-24 01:11:18 上传评论收藏 1.38MB PDF 举报

资源推荐

资源详情

资源评论

笔记本：大数据技术

创建时间： 2018-03-18 02:25 更新时间： 2018-04-29 02:19

作者： Oracle@MarkLin

URL： https://download.csdn.net/download/sum__mer/9630121

Linux搭建Hadoop开发环境

1.Hadoop：

Overview

Hadoop MapReduce is a software framework for easily writing applications which process vast amounts of data

(multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable,

fault-tolerant manner.

A MapReduce job usually splits the input data-set into independent chunks which are processed by the map

tasks in a completely parallel manner. The framework sorts the outputs of the maps, which are then input to

the reduce tasks. Typically both the input and the output of the job are stored in a file-system. The framework

takes care of scheduling tasks, monitoring them and re-executes the failed tasks.

Typically the compute nodes and the storage nodes are the same, that is, the MapReduce framework and the

Hadoop Distributed File System (see HDFS Architecture Guide) are running on the same set of nodes. This

configuration allows the framework to effectively schedule tasks on the nodes where data is already present,

resulting in very high aggregate bandwidth across the cluster.

The MapReduce framework consists of a single master ResourceManager, one slave NodeManager per cluster-

node, and MRAppMaster per application (see YARN Architecture Guide).

Minimally, applications specify the input/output locations and supply map and reduce functions via

implementations of appropriate interfaces and/or abstract-classes. These, and other job parameters, comprise

the job configuration.

The Hadoop job client then submits the job (jar/executable etc.) and configuration to the ResourceManager which

then assumes the responsibility of distributing the software/configuration to the slaves, scheduling tasks and

monitoring them, providing status and diagnostic information to the job-client.

Although the Hadoop framework is implemented in Java™, MapReduce applications need not be written in Java.

Hadoop Streaming is a utility which allows users to create and run jobs with any executables (e.g. shell utilities)

as the mapper and/or the reducer.

Hadoop Pipes is a SWIG-compatible C++ API to implement MapReduce applications (non JNI™ based).

Hadoop Mapreduce是一个易于编程并且能在大型集群（上千节点）快速地并行得处理大量数据的软件框架，以

可靠，容错的方式部署在商用机器上。

MapReduce Job通常将独立大块数据切片以完全并行的方式在map任务中处理。该框架对maps输出的做为

reduce输入的数据进行排序，Job的输入输出都是存储在文件系统中。该框架调度任务、监控任务和重启失效的任

务。

一般来说计算节点和存储节点都是同样的设置，MapReduce框架和HDFS运行在同组节点。这样的设定使得

MapReduce框架能够以更高的带宽来执行任务，当数据已经在节点上时。

MapReduce 框架包含一个主ResourceManager，每个集群节点都有一个从NodeManager和每个应用都有一个

MRAppMaster。

应用最少必须指定输入和输出的路径并且通过实现合适的接口或者抽象类来提供map和reduce功能。前面这部分

内容和其他Job参数构成了Job的配置。

Hadoop 客户端提交Job和配置信息给ResourceManger，它将负责把配置信息分配给从属节点，调度任务并且监

控它们，把状态信息和诊断信息传输给客户端。

尽管 MapReduce 框架是用Java实现的，但是 MapReduce 应用却不一定要用Java编写。

Hadoop Streaming 是一个工具允许用户创建和运行任何可执行文件。

Hadoop Pipes 是兼容SWIG用来实现 MapReduce 应用的C++ API（不是基于JNI）.

Inputs and Outputs

The MapReduce framework operates exclusively on <key, value> pairs, that is, the framework views the input to

the job as a set of <key, value> pairs and produces a set of <key, value>pairs as the output of the job,

conceivably of different types.

The key and value classes have to be serializable by the framework and hence need to implement

the Writable interface. Additionally, the key classes have to implement theWritableComparable interface to

facilitate sorting by the framework.

Input and Output types of a MapReduce job:

(input) <k1, v1> -> map -> <k2, v2> -> combine -> <k2, v2> -> reduce -> <k3, v3> (output)

MapReduce 框架只操作键值对，MapReduce 将job的不同类型输入当做键值对来处理并且生成一组键值对作为

输出。

Key和Value类必须通过实现Writable接口来实现序列化。此外，Key类必须实现WritableComparable 来使得排序

更简单。

MapRedeuce job 的输入输出类型：(input) ->map-> ->combine-> ->reduce-> (output)

2.Hadoop环境搭建安装配置：

[1].官网下载Hadoop-2.7.5安装包：

hadoop-2.7.5/hadoop-2.7.5.tar.gz

剩余15页未读，继续阅读

评论收藏

内容反馈

weixin_38669628

粉丝: 388

Linux搭建Hadoop开发环境

linux搭建hadoop环境

虚拟机下Hadoop开发环境搭建

Hadoop开发环境搭建Win8+Eclipse+Linux.pdf

在本机搭建hadoop3.1.2开发环境详解

详解从 0 开始使用 Docker 快速搭建 Hadoop 集群环境

hadoop调度指南

Hadoop开发环境搭建

Linux下的Hadoop搭建1

Hadoop 环境的搭建

Hadoop环境搭建

Linux开发环境搭建1

hadoop环境搭建

linux安装Hadoop

hadoop开发环境搭建

Linux下Hadoop开发环境搭建详解

windows下搭建hadoop开发环境(Eclipse)

windows下搭建hadoop开发环境

LinuxRedHat、CentOS上搭建Hadoop集群.pdf

Hadoop入门·环境搭建

Hadoop开发环境配置

最详细的Hadoop环境搭建

Hadoop云环境部署搭建

Linux下Hadoop配置

Linux开发环境的搭建及其基本介绍

第6集-Hadoop环境搭建 - linux（centos7） - 安装配置hbase1.3.6.pdf

第2集-Hadoop环境搭建 - linux（centos7） - 安装配置jdk1.8.pdf

谷歌地图，青岛卫星数据

重庆音乐网

最新资源