没有合适的资源?快使用搜索试试~ 我知道了~
NULL 博文链接:https://mazhilin.iteye.com/blog/2415060
资源推荐
资源详情
资源评论













笔记本: 大数据技术
创建时间: 2018-03-18 02:25 更新时间: 2018-04-29 02:19
作者: Oracle@MarkLin
URL: https://download.csdn.net/download/sum__mer/9630121
Linux搭建Hadoop开发环境
Linux搭建Hadoop开发环境
1.Hadoop:
Overview
Hadoop MapReduce is a software framework for easily writing applications which process vast amounts of data
(multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable,
fault-tolerant manner.
A MapReduce job usually splits the input data-set into independent chunks which are processed by the map
tasks in a completely parallel manner. The framework sorts the outputs of the maps, which are then input to
the reduce tasks. Typically both the input and the output of the job are stored in a file-system. The framework
takes care of scheduling tasks, monitoring them and re-executes the failed tasks.
Typically the compute nodes and the storage nodes are the same, that is, the MapReduce framework and the
Hadoop Distributed File System (see HDFS Architecture Guide) are running on the same set of nodes. This
configuration allows the framework to effectively schedule tasks on the nodes where data is already present,
resulting in very high aggregate bandwidth across the cluster.
The MapReduce framework consists of a single master ResourceManager, one slave NodeManager per cluster-
node, and MRAppMaster per application (see YARN Architecture Guide).
Minimally, applications specify the input/output locations and supply map and reduce functions via
implementations of appropriate interfaces and/or abstract-classes. These, and other job parameters, comprise
the job configuration.
The Hadoop job client then submits the job (jar/executable etc.) and configuration to the ResourceManager which
then assumes the responsibility of distributing the software/configuration to the slaves, scheduling tasks and
monitoring them, providing status and diagnostic information to the job-client.
Although the Hadoop framework is implemented in Java™, MapReduce applications need not be written in Java.
Hadoop Streaming is a utility which allows users to create and run jobs with any executables (e.g. shell utilities)
as the mapper and/or the reducer.
Hadoop Pipes is a SWIG-compatible C++ API to implement MapReduce applications (non JNI™ based).
Hadoop Mapreduce是一个易于编程并且能在大型集群(上千节点)快速地并行得处理大量数据的软件框架,以
可靠,容错的方式部署在商用机器上。
MapReduce Job通常将独立大块数据切片以完全并行的方式在map任务中处理。该框架对maps输出的做为
reduce输入的数据进行排序,Job的输入输出都是存储在文件系统中。该框架调度任务、监控任务和重启失效的任
务。

一般来说计算节点和存储节点都是同样的设置,MapReduce框架和HDFS运行在同组节点。这样的设定使得
MapReduce框架能够以更高的带宽来执行任务,当数据已经在节点上时。
MapReduce 框架包含一个主ResourceManager,每个集群节点都有一个从NodeManager和每个应用都有一个
MRAppMaster。
应用最少必须指定输入和输出的路径并且通过实现合适的接口或者抽象类来提供map和reduce功能。前面这部分
内容和其他Job参数构成了Job的配置。
Hadoop 客户端提交Job和配置信息给ResourceManger,它将负责把配置信息分配给从属节点,调度任务并且监
控它们,把状态信息和诊断信息传输给客户端。
尽管 MapReduce 框架是用Java实现的,但是 MapReduce 应用却不一定要用Java编写。
Hadoop Streaming 是一个工具允许用户创建和运行任何可执行文件。
Hadoop Pipes 是兼容SWIG用来实现 MapReduce 应用的C++ API(不是基于JNI).
Inputs and Outputs
The MapReduce framework operates exclusively on <key, value> pairs, that is, the framework views the input to
the job as a set of <key, value> pairs and produces a set of <key, value>pairs as the output of the job,
conceivably of different types.
The key and value classes have to be serializable by the framework and hence need to implement
the Writable interface. Additionally, the key classes have to implement theWritableComparable interface to
facilitate sorting by the framework.
Input and Output types of a MapReduce job:
(input) <k1, v1> -> map -> <k2, v2> -> combine -> <k2, v2> -> reduce -> <k3, v3> (output)
MapReduce 框架只操作键值对,MapReduce 将job的不同类型输入当做键值对来处理并且生成一组键值对作为
输出。
Key和Value类必须通过实现Writable接口来实现序列化。此外,Key类必须实现WritableComparable 来使得排序
更简单。
MapRedeuce job 的输入输出类型:(input) ->map-> ->combine-> ->reduce-> (output)
2.Hadoop环境搭建安装配置:
[1].官网下载Hadoop-2.7.5安装包:
hadoop-2.7.5/hadoop-2.7.5.tar.gz

[2].把Hadoop-2.7.5安装包利用Xftp5工具上传到:/usr/local/hadoop
[3].登录Liunx服务器,利用Xhell5进入:cd /usr/local/hadoop:
[root@marklin hadoop]# cd /usr/local/hadoop
[root@marklin hadoop]#
并使用tar -xvf 解压:tar -xvf hadoop-2.7.5.tar.gz,
[root@marklin hadoop]# tar -xvf hadoop-2.7.5.tar.gz
[4].配置Hadoop环境变量,输入:vim /etc/profile
#Setting HADOOP_HOME PATH
export HADOOP_HOME=/usr/local/hadoop/hadoop-2.7.5
export PATH=${PATH}:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin::${HADOOP_HOME}/lib
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export HADOOP_MAPARED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HDFS_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop
剩余15页未读,继续阅读
资源评论


weixin_38669628
- 粉丝: 388
上传资源 快速赚钱
我的内容管理 展开
我的资源 快来上传第一个资源
我的收益
登录查看自己的收益我的积分 登录查看自己的积分
我的C币 登录后查看C币余额
我的收藏
我的下载
下载帮助


最新资源
- PHOTOSHOP说课稿.doc
- 计算机专业职业生涯规划书样本.doc
- c语言初学必背代码.doc
- 思科系统公司的无线网络帮助密尔沃基儿童医院改进患者服务.docx
- 物联网行业应用及技术.doc
- COMSOL激光熔覆技术详解:模型与视频教程助力高效表面改性 详解
- (源码)基于ROS和ChibiOS的移动机器人实时控制系统.zip
- 武汉智慧城市概念设计方案终稿.docx
- 网络营销技术组合.pptx
- 设计企业信息化解决方案.doc
- 项目管理的特点[最终版].pdf
- 2022年会计职称计算机考试题库.doc
- 2023年ORACLE定时备份方案.doc
- 企业会计学网络实验指导书.doc
- 山东省淄博市应用软件开发公司名录2019版762家.pdf
- 2023年二级C语言公共基础知识.doc
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈



安全验证
文档复制为VIP权益,开通VIP直接复制
