三个组件
- JobClient(准备运行环境)
- JobTracker(接收作业)
- TaskTracker(初始化作业)
注意,本书写的是1.x版本,hadoop2.x版本以及使用yarn来管理了,没有JobTracker和TaskTracker了
新旧Hadoop MapReduce框架对比
1、客户端不变,其调用API及接口大部分保持兼容,这也是为了开发使用者透明化,对原码不必做大的改变,但是原框架的JobTracker和TaskTracker不见了,取而代之的是ResourceManager AppliactionMaster NodeManager三个部分。
2、ResourceManager是一个中心的服务,它做的事情是调度、启动每一个Job所属的ApplicationMaster、另外监控ApplicationMaster的存在情况。Job里面所在的task的监控,重启等内容不见了,这就是ApplicationMaster存在的原因。ResourceManager负责作业与资源的调度,接收JobSubmitter提交的作业,按照作业的上下文(context)信息,以及从NodeManager收集来的状态信息,启动调度过程,分配一个Container作为Application Master
3、NodeManager功能比较专一,就是负责Container状态的维护,并向RM保持心跳。
4、ApplicationMaster负责一个Job生命周期内的所有工作,类似老的框架中JobTracker,但注意每一个Job(不是每一种)都有一个ApplicationMaster,他可以运行在ResourceManager以外的机器上.
四个步骤
- 用户提交作业
- JobClient按照作业配置信息(Jobconf),将作业运行需要的文件上传 到JobTracker文件系统(通常是HDFS,HDFS中的文件是所有节点共享的)中的某个目录下
- JobClient调用RPC接口向JobTracker提交作业
- JobTracker接收到作业后,将其告知TaskScheduler,由TaskScheduler对作业进行初始化
具体细节
作业提交过程
执行shell命令
用户写程序后,打包为jar包,然后执行jar命令提交作业,交给RunJar类处理,RunJar类中的main函数经解压jar包和设置环境变量后将运行参数传递给MapReduce程序,并运行。
/**
* Unpack a jar file into a directory.
*
* This version unpacks all files inside the jar regardless of filename.
*
* @param jarFile the .jar file to unpack
* @param toDir the destination directory into which to unpack the jar
*
* @throws IOException if an I/O error has occurred or toDir
* cannot be created and does not already exist
*/
public static void unJar(File jarFile, File toDir) throws IOException {
unJar(jarFile, toDir, MATCH_ANY);
}
/**
* Creates a classloader based on the environment that was specified by the
* user. If HADOOP_USE_CLIENT_CLASSLOADER is specified, it creates an
* application classloader that provides the isolation of the user class space
* from the hadoop classes and their dependencies. It forms a class space for
* the user jar as well as the HADOOP_CLASSPATH. Otherwise, it creates a
* classloader that simply adds the user jar to the classpath.
*/
private ClassLoader createClassLoader(File file, final File workDir)
throws MalformedURLException {
ClassLoader loader;
// see if the client classloader is enabled
if (useClientClassLoader()) {
StringBuilder sb = new StringBuilder();
sb.append(workDir).append("/").
append(File.pathSeparator).append(file).
append(File.pathSeparator).append(workDir).append("/classes/").
append(File.pathSeparator).append(workDir).append("/lib/*");
// HADOOP_CLASSPATH is added to the client classpath
String hadoopClasspath = getHadoopClasspath();
if (hadoopClasspath != null && !hadoopClasspath.isEmpty()) {
sb.append(File.pathSeparator).append(hadoopClasspath);
}
String clientClasspath = sb.toString();
// get the system classes
String systemClasses = getSystemClasses();
List<String> systemClassesList = systemClasses == null ?
null :
Arrays.asList(StringUtil