Azkaban 一个批量工作流任务调度器
为什么需要工作流调度系统
1)一个完整的数据分析系统通常都是由大量任务单元组成:
Shell脚本程序,Java程序,MapReduce程序、Hive脚本等
2)各任务单元之间存在时间先后及前后依赖关系
3)为了很好地组织起这样的复杂执行计划,需要一个工作流调度系统来调度执行;
常见工作流调度系统
1)简单的任务调度:直接使用Linux的Crontab来定义;
2)复杂的任务调度:开发调度平台或使用现成的开源调度系统,比如Ooize、Azkaban、 Airflow、DolphinScheduler等。
Azkaban与Oozie对比
总体来说,Ooize相比Azkaban是一个重量级的任务调度系统,功能全面,但配置使用也更复杂。如果可以不在意某些功能的缺失,轻量级调度器Azkaban是很不错的候选对象。
Azkaban的架构
Azkaban由三个关键组件构成:
- AzkabanWebServer:AzkabanWebServer是整个Azkaban工作流系统的主要管理者,它用户登录认证、负责project管理、定时执行工作流、跟踪工作流执行进度等一系列任务。
- AzkabanExecutorServer:负责具体的工作流的提交、执行,它们通过mysql数据库来协调任务的执行。
- 关系型数据库(MySQL):存储大部分执行流状态,AzkabanWebServer和AzkabanExecutorServer都需要访问数据库。
Azkaban下载地址
下载地址:http://azkaban.github.io/downloads.html
Azkaban安装部署
角色分配
主机名 |
AzkabanWebServer |
AzkabanExecutorServer |
Bus-HMgr-01 |
√ |
√ |
Bus-HMgr-02 |
√ | |
Bus-HData-01 |
√ |
安装前准备
- 将Azkaban Web服务器、Azkaban执行服务器、Azkaban的sql执行脚本及MySQL安装包拷贝到190.176.35.102服务器的/share/apps/azkaban/softwares目录下
- azkaban-web-server-3.84.4.tar.gz
- azkaban-exec-server-3.84.4.tar.gz
- azkaban-db-3.84.4.tar.gz
选择Mysql作为Azkaban数据库,因为Azkaban建立了一些Mysql连接增强功能,以方便Azkaban设置,并增强服务可靠性。
- 安装好MySQL数据库。
目前安装的数据库信息为:
190.176.32.78
root/Sibat@2021
安装Azkaban
1) 进入安装包所在的目录/share/apps/azkaban/softwares
$ cd /share/apps/azkaban/softwares
2) 解压azkaban-web-server-3.84.4.tar.gz、azkaban-exec-server-3.84.4.tar.gz、azkaban-db-3.84.4.tar.gz到/opt/module/azkaban目录下
$ tar -zxvf azkaban-web-server-3.84.4.tar.gz -C /share/apps/azkaban/
$ tar -zxvf azkaban-executor-server-3.84.4.tar.gz -C /share/apps/azkaban/
$ tar -zxvf azkaban-sql-script-3.84.4.tar.gz -C /share/apps/azkaban/
3) 对解压后的文件重新命名(也可不修改)
$ mv azkaban-web-3.84.4/ server
$ mv azkaban-executor-3.84.4/ executor
4) azkaban脚本导入
进入mysql,创建Azkaban用户,任何主机都可以访问Azkaban,密码是sibat706
Mysql> CREATE USER ‘azkaban’@’%’ IDENTIFIED BY ‘sibat706’;
赋予Azkaban用户增删改查权限
Mysql> GRANT SELECT,INSERT,UPDATE,DELETE ON azkaban.* to ‘azkaban’@’%’ WITH GRANT OPTION;
创建azkaban数据库,并将解压的脚本导入到azkaban数据库。
mysql> create database azkaban;
mysql> use azkaban;
mysql> source /share/apps/azkaban/create-all-sql-3.84.4.sql
或者直接执行如下create-all-sql-3.84.4.sql的语句,创建azkaban所需要的表:
CREATE TABLE active_executing_flows (
exec_id INT,
update_time BIGINT,
PRIMARY KEY (exec_id)
);
CREATE TABLE active_sla (
exec_id INT NOT NULL,
job_name VARCHAR(128) NOT NULL,
check_time BIGINT NOT NULL,
rule TINYINT NOT NULL,
enc_type TINYINT,
options LONGBLOB NOT NULL,
PRIMARY KEY (exec_id, job_name)
);
CREATE TABLE execution_dependencies(
trigger_instance_id varchar(64),
dep_name varchar(128),
starttime bigint(20) not null,
endtime bigint(20),
dep_status tinyint not null,
cancelleation_cause tinyint not null,
project_id INT not null,
project_version INT not null,
flow_id varchar(128) not null,
flow_version INT not null,
flow_exec_id INT not null,
primary key(trigger_instance_id, dep_name)
);
CREATE INDEX ex_end_time
ON execution_dependencies (endtime);
CREATE TABLE execution_flows (
exec_id INT NOT NULL AUTO_INCREMENT,
project_id INT NOT NULL,
version INT NOT NULL,
flow_id VARCHAR(128) NOT NULL,
status TINYINT,
submit_user VARCHAR(64),
submit_time BIGINT,
update_time BIGINT,
start_time BIGINT,
end_time BIGINT,
enc_type TINYINT,
flow_data LONGBLOB,
executor_id INT DEFAULT NULL,
use_executor INT DEFAULT NULL,
flow_priority TINYINT NOT NULL DEFAULT 5,
PRIMARY KEY (exec_id)
);
CREATE INDEX ex_flows_start_time
ON execution_flows (start_time);
CREATE INDEX ex_flows_end_time
ON execution_flows (end_time);
CREATE INDEX ex_flows_time_range
ON execution_flows (start_time, end_time);
CREATE INDEX ex_flows_flows
ON execution_flows (project_id, flow_id);
CREATE INDEX executor_id
ON execution_flows (executor_id);
CREATE INDEX ex_flows_staus
ON execution_flows (status);
CREATE TABLE execution_jobs (
exec_id INT NOT NULL,
project_id INT NOT NULL,
version INT NOT NULL,
flow_id VARCHAR(128) NOT NULL,
job_id VARCHAR(512) NOT NULL,
attempt INT,
start_time BIGINT,
end_time BIGINT,
status TINYINT,
input_params LONGBLOB,
output_params LONGBLOB,
attachments LONGBLOB,
PRIMARY KEY (exec_id, job_id, flow_id, attempt)
);
CREATE INDEX ex_job_id
ON execution_jobs (project_id, job_id);
-- In table execution_logs, name is the combination of flow_id and job_id
--
-- prefix support and lengths of prefixes (where supported) are storage engine dependent.
-- By default, the index key prefix length limit is 767 bytes for innoDB.
-- from: https://dev.mysql.com/doc/refman/5.7/en/create-index.html
CREATE TABLE execution_logs (
exec_id INT NOT NULL,
name VARCHAR(640),
attempt INT,
enc_type TINYINT,
start_byte INT,
end_byte INT,
log LONGBLOB,
upload_time BIGINT,
PRIMARY KEY (exec_id, name, attempt, start_byte)
);
CREATE INDEX ex_log_attempt
ON execution_logs (exec_id, name, attempt);
CREATE INDEX ex_log_index
ON execution_logs (exec_id, name);
CREATE INDEX ex_log_upload_time
ON execution_logs (upload_time);
CREATE TABLE executor_events (
executor_id INT NOT NULL,
event_type TINYINT NOT NULL,
event_time DATETIME NOT NULL,
username VARCHAR(64),
message VARCHAR(512)
);
CREATE INDEX executor_log
ON executor_events (executor_id, event_time);
CREATE TABLE executors (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
host VARCHAR(64) NOT NULL,
port INT NOT NULL,
active BOOLEAN DEFAULT FALSE,
UNIQUE (host, port)
);
CREATE INDEX executor_connection
ON executors (host, port);
CREATE TABLE project_events (
project_id INT NOT NULL,
event_type TINYINT NOT NULL,
event_time BIGINT NOT NULL,
username VARCHAR(64),
message VARCHAR(512)
);
CREATE INDEX log
ON project_events (project_id, event_time);
CREATE TABLE project_files (
project_id INT NOT NULL,
version INT NOT NULL,
chunk INT,
size INT,
file LONGBLOB,
PRIMARY KEY (project_id, version, chunk)
);
CREATE INDEX file_version
ON project_files (project_id, version);
CREATE TABLE project_flow_files (
project_id INT NOT NULL,
project_version INT NOT NULL,
flow_name VARCHAR(128) NOT NULL,
flow_version INT NOT NULL,
modified_time BIGINT NOT NULL,
flow_file LONGBLOB,
PRIMARY KEY (project_id, project_version, flow_name, flow_version)
);
CREATE TABLE project_flows (
project_id INT NOT NULL,
version INT NOT NULL,
flow_id VARCHAR(128),
modified_time BIGINT NOT NULL,
encoding_type TINYINT,
json MEDIUMBLOB,
PRIMARY KEY (project_id, version, flow_id)
);
CREATE INDEX flow_index
ON project_flows (project_id, version);
CREATE TABLE project_permissions (
project_id VARCHAR(64) NOT NULL,
modified_time BIGINT NOT NULL,
name VARCHAR(64) NOT NULL,
permissions INT NOT NULL,
isGroup BOOLEAN NOT NULL,
PRIMARY KEY (project_id, name, isGroup)
);
CREATE INDEX permission_index
ON project_permissions (project_id);
CREATE TABLE project_properties (
project_id INT NOT NULL,
version INT NOT NULL,
name VARCHAR(255),
modified_time BIGINT NOT NULL,
encoding_type TINYINT,
property BLOB,
PRIMARY KEY (project_id, version, name)
);
CREATE INDEX properties_index
ON project_properties (project_id, version);
CREATE TABLE project_versions (
project_id INT NOT NULL,
version INT NOT NULL,
upload_time BIGINT NOT NULL,
uploader VARCHAR(64) NOT NULL,
file_type VARCHAR(16),
file_name VARCHAR(128),
md5 BINARY(16),
num_chunks INT,
resource_id VARCHAR(512) DEFAULT NULL,
startup_dependencies MEDIUMBLOB DEFAULT NULL,
uploader_ip_addr VARCHAR(50) DEFAULT NULL,
PRIMARY KEY (project_id, version)
);
CREATE INDEX version_index
ON project_versions (project_id);
CREATE TABLE projects (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(64) NOT NULL,
active BOOLEAN,
modified_time BIGINT NOT NULL,
create_time BIGINT NOT NULL,
version INT,
last_modified_by VARCHAR(64) NOT NULL,
description VARCHAR(2048),
enc_type TINYINT,
settings_blob LONGBLOB
);
CREATE INDEX project_name
ON projects (name);
CREATE TABLE properties (
name VARCHAR(64) NOT NULL,
type INT NOT NULL,
modified_time BIGINT NOT NULL,
value VARCHAR(256),
PRIMARY KEY (name, type)
);
-- This file collects all quartz table create statement required for quartz 2.2.1
--
-- We are using Quartz 2.2.1 tables, the original place of which can be found at
-- https://github.com/quartz-scheduler/quartz/blob/quartz-2.2.1/distribution/src/main/assembly/root/docs/dbTables/tables_mysql.sql
DROP TABLE IF EXISTS QRTZ_FIRED_TRIGGERS;
DROP TABLE IF EXISTS QRTZ_PAUSED_TRIGGER_GRPS;
DROP TABLE IF EXISTS QRTZ_SCHEDULER_STATE;
DROP TABLE IF EXISTS QRTZ_LOCKS;
DROP TABLE IF EXISTS QRTZ_SIMPLE_TRIGGERS;
DROP TABLE IF EXISTS QRTZ_SIMPROP_TRIGGERS;
DROP TABLE IF EXISTS QRTZ_CRON_TRIGGERS;
DROP TABLE IF EXISTS QRTZ_BLOB_TRIGGERS;
DROP TABLE IF EXISTS QRTZ_TRIGGERS;
DROP TABLE IF EXISTS QRTZ_JOB_DETAILS;
DROP TABLE IF EXISTS QRTZ_CALENDARS;
CREATE TABLE QRTZ_JOB_DETAILS
(
SCHED_NAME VARCHAR(120) NOT NULL,
JOB_NAME VARCHAR(200) NOT NULL,
JOB_GROUP VARCHAR(200) NOT NULL,
DESCRIPTION VARCHAR(250) NULL,
JOB_CLASS_NAME VARCHAR(250) NOT NULL,
IS_DURABLE VARCHAR(1) NOT NULL,
IS_NONCONCURRENT VARCHAR(1) NOT NULL,
IS_UPDATE_DATA VARCHAR(1) NOT NULL,
REQUESTS_RECOVERY VARCHAR(1) NOT NULL,
JOB_DATA BLOB NULL,
PRIMARY KEY (SCHED_NAME,JOB_NAME,JOB_GROUP)
);
CREATE TABLE QRTZ_TRIGGERS
(
SCHED_NAME VARCHAR(120) NOT NULL,
TRIGGER_NAME VARCHAR(200) NOT NULL,
TRIGGER_GROUP VARCHAR(200) NOT NULL,
JOB_NAME VARCHAR(200) NOT NULL,
JOB_GROUP VARCHAR(200) NOT NULL,
DESCRIPTION VARCHAR(250) NULL,
NEXT_FIRE_TIME BIGINT(13) NULL,
PREV_FIRE_TIME BIGINT(13) NULL,
PRIORITY INTEGER NULL,
TRIGGER_STATE VARCHAR(16) NOT NULL,
TRIGGER_TYPE VARCHAR(8) NOT NULL,
START_TIME BIGINT(13) NOT NULL,
END_TIME BIGINT(13) NULL,
CALENDAR_NAME VARCHAR(200) NULL,
MISFIRE_INSTR SMALLINT(2) NULL,
JOB_DATA BLOB NULL,
PRIMARY KEY (SCHED_NAME,TRIGGER_NAME,TRIGGER_GROUP),
FOREIGN KEY (SCHED_NAME,JOB_NAME,JOB_GROUP)
REFERENCES QRTZ_JOB_DETAILS(SCHED_NAME,JOB_NAME,JOB_GROUP)
);
CREATE TABLE QRTZ_SIMPLE_TRIGGERS
(
SCHED_NAME VARCHAR(120) NOT NULL,
TRIGGER_NAME VARCHAR(200) NOT NULL,
TRIGGER_GROUP VARCHAR(200) NOT NULL,
REPEAT_COUNT BIGINT(7) NOT NULL,
REPEAT_INTERVAL BIGINT(12) NOT NULL,
TIMES_TRIGGERED BIGINT(10) NOT NULL,
PRIMARY KEY (SCHED_NAME,TRIGGER_NAME,TRIGGER_GROUP),
FOREIGN KEY (SCHED_NAME,TRIGGER_NAME,TRIGGER_GROUP)
REFERENCES QRTZ_TRIGGERS(SCHED_NAME,TRIGGER_NAME,TRIGGER_GROUP)
);
CREATE TABLE QRTZ_CRON_TRIGGERS
(
SCHED_NAME VARCHAR(120) NOT NULL,
TRIGGER_NAME VARCHAR(200) NOT NULL,
TRIGGER_GROUP VARCHAR(200) NOT NULL,
CRON_EXPRESSION VARCHAR(200) NOT NULL,
TIME_ZONE_ID VARCHAR(80),
PRIMARY KEY (SCHED_NAME,TRIGGER_NAME,TRIGGER_GROUP),
FOREIGN KEY (SCHED_NAME,TRIGGER_NAME,TRIGGER_GROUP)
REFERENCES QRTZ_TRIGGERS(SCHED_NAME,TRIGGER_NAME,TRIGGER_GROUP)
);
CREATE TABLE QRTZ_SIMPROP_TRIGGERS
(
SCHED_NAME VARCHAR(120) NOT NULL,
TRIGGER_NAME VARCHAR(200) NOT NULL,
TRIGGER_GROUP VARCHAR(200) NOT NULL,
STR_PROP_1 VARCHAR(512) NULL,
STR_PROP_2 VARCHAR(512) NULL,
STR_PROP_3 VARCHAR(512) NULL,
INT_PROP_1 INT NULL,
INT_PROP_2 INT NULL,
LONG_PROP_1 BIGINT NULL,
LONG_PROP_2 BIGINT NULL,
DEC_PROP_1 NUMERIC(13,4) NULL,
DEC_PROP_2 NUMERIC(13,4) NULL,
BOOL_PROP_1 VARCHAR(1) NULL,
BOOL_PROP_2 VARCHAR(1) NULL,
PRIMARY KEY (SCHED_NAME,TRIGGER_NAME,TRIGGER_GROUP),
FOREIGN KEY (SCHED_NAME,TRIGGER_NAME,TRIGGER_GROUP)
REFERENCES QRTZ_TRIGGERS(SCHED_NAME,TRIGGER_NAME,TRIGGER_GROUP)
);
CREATE TABLE QRTZ_BLOB_TRIGGERS
(
SCHED_NAME VARCHAR(120) NOT NULL,
TRIGGER_NAME VARCHAR(200) NOT NULL,
TRIGGER_GROUP VARCHAR(200) NOT NULL,
BLOB_DATA BLOB NULL,
PRIMARY KEY (SCHED_NAME,TRIGGER_NAME,TRIGGER_GROUP),
FOREIGN KEY (SCHED_NAME,TRIGGER_NAME,TRIGGER_GROUP)
REFERENCES QRTZ_TRIGGERS(SCHED_NAME,TRIGGER_NAME,TRIGGER_GROUP)
);
CREATE TABLE QRTZ_CALENDARS
(
SCHED_NAME VARCHAR(120) NOT NULL,
CALENDAR_NAME VARCHAR(200) NOT NULL,
CALENDAR BLOB NOT NULL,
PRIMARY KEY (SCHED_NAME,CALENDAR_NAME)
);
CREATE TABLE QRTZ_PAUSED_TRIGGER_GRPS
(
SCHED_NAME VARCHAR(120) NOT NULL,
TRIGGER_GROUP VARCHAR(200) NOT NULL,
PRIMARY KEY (SCHED_NAME,TRIGGER_GROUP)
);
CREATE TABLE QRTZ_FIRED_TRIGGERS
(
SCHED_NAME VARCHAR(120) NOT NULL,
ENTRY_ID VARCHAR(95) NOT NULL,
TRIGGER_NAME VARCHAR(200) NOT NULL,
TRIGGER_GROUP VARCHAR(200) NOT NULL,
INSTANCE_NAME VARCHAR(200) NOT NULL,
FIRED_TIME BIGINT(13) NOT NULL,
SCHED_TIME BIGINT(13) NOT NULL,
PRIORITY INTEGER NOT NULL,
STATE VARCHAR(16) NOT NULL,
JOB_NAME VARCHAR(200) NULL,
JOB_GROUP VARCHAR(200) NULL,
IS_NONCONCURRENT VARCHAR(1) NULL,
REQUESTS_RECOVERY VARCHAR(1) NULL,
PRIMARY KEY (SCHED_NAME,ENTRY_ID)
);
CREATE TABLE QRTZ_SCHEDULER_STATE
(
SCHED_NAME VARCHAR(120) NOT NULL,
INSTANCE_NAME VARCHAR(200) NOT NULL,
LAST_CHECKIN_TIME BIGINT(13) NOT NULL,
CHECKIN_INTERVAL BIGINT(13) NOT NULL,
PRIMARY KEY (SCHED_NAME,INSTANCE_NAME)
);
CREATE TABLE QRTZ_LOCKS
(
SCHED_NAME VARCHAR(120) NOT NULL,
LOCK_NAME VARCHAR(40) NOT NULL,
PRIMARY KEY (SCHED_NAME,LOCK_NAME)
);
commit;
CREATE TABLE ramp (
rampId VARCHAR(45) NOT NULL,
rampPolicy VARCHAR(45) NOT NULL,
maxFailureToPause INT NOT NULL DEFAULT 0,
maxFailureToRampDown INT NOT NULL DEFAULT 0,
isPercentageScaleForMaxFailure TINYINT NOT NULL DEFAULT 0,
startTime BIGINT NOT NULL DEFAULT 0,
endTime BIGINT NOT NULL DEFAULT 0,
lastUpdatedTime BIGINT NOT NULL DEFAULT 0,
numOfTrail INT NOT NULL DEFAULT 0,
numOfFailure INT NOT NULL DEFAULT 0,
numOfSuccess INT NOT NULL DEFAULT 0,
numOfIgnored INT NOT NULL DEFAULT 0,
isPaused TINYINT NOT NULL DEFAULT 0,
rampStage TINYINT NOT NULL DEFAULT 0,
isActive TINYINT NOT NULL DEFAULT 0,
PRIMARY KEY (rampId)
);
CREATE INDEX idx_ramp
ON ramp (rampId);
CREATE TABLE ramp_dependency (
dependency VARCHAR(45) NOT NULL,
defaultValue VARCHAR (500),
jobtypes VARCHAR (1000),
PRIMARY KEY (dependency)
);
CREATE INDEX idx_ramp_dependency
ON ramp_dependency(dependency);
CREATE TABLE ramp_exceptional_flow_items (
rampId VARCHAR(45) NOT NULL,
flowId VARCHAR(128) NOT NULL,
treatment VARCHAR(1) NOT NULL,
timestamp BIGINT NULL,
PRIMARY KEY (rampId, flowId)
);
CREATE INDEX idx_ramp_exceptional_flow_items
ON ramp_exceptional_flow_items (rampId, flowId);
CREATE TABLE ramp_exceptional_job_items (
rampId VARCHAR(45) NOT NULL,
flowId VARCHAR(128) NOT NULL,
jobId VARCHAR(128) NOT NULL,
treatment VARCHAR(1) NOT NULL,
timestamp BIGINT NULL,
PRIMARY KEY (rampId, flowId, jobId)
);
CREATE INDEX idx_ramp_exceptional_job_items
ON ramp_exceptional_job_items (rampId, flowId, jobId);
CREATE TABLE ramp_items (
rampId VARCHAR(45) NOT NULL,
dependency VARCHAR(45) NOT NULL,
rampValue VARCHAR (500) NOT NULL,
PRIMARY KEY (rampId, dependency)
);
CREATE INDEX idx_ramp_items
ON ramp_items (rampId, dependency);
CREATE TABLE triggers (
trigger_id INT NOT NULL AUTO_INCREMENT,
trigger_source VARCHAR(128),
modify_time BIGINT NOT NULL,
enc_type TINYINT,
data LONGBLOB,
PRIMARY KEY (trigger_id)
);
CREATE TABLE validated_dependencies (
file_name VARCHAR(128),
file_sha1 CHAR(40),
validation_key CHAR(40),
validation_status INT,
PRIMARY KEY (validation_key, file_name, file_sha1)
);
配置Excecuture Server
编辑azkaban.properties
主要修改的配置为(时区,webserver的url,数据库MySQL):
- default.timezone.id=Asia/Shanghai
- jetty.port=8081
- # Where the Azkaban web server is located
- azkaban.webserver.url=http://190.176.35.102:8081
- executor.port=12321
- database.type=mysql
- mysql.port=3306
- mysql.host=190.176.32.78
- mysql.database=azkaban
- mysql.user=azkaban
- mysql.password=sibat706
vim /azkaban/azkaban-exec-server-3.84.4
# Azkaban Personalization Settings
azkaban.name=Test
azkaban.label=My Local Azkaban
azkaban.color=#FF3601
azkaban.default.servlet.path=/index
web.resource.dir=web/
default.timezone.id=Asia/Shanghai
# Azkaban UserManager class
user.manager.class=azkaban.user.XmlUserManager
user.manager.xml.file=conf/azkaban-users.xml
# Loader for projects
executor.global.properties=conf/global.properties
azkaban.project.dir=projects
# Velocity dev mode
velocity.dev.mode=false
# Azkaban Jetty server properties.
jetty.use.ssl=false
jetty.maxThreads=25
jetty.port=8081
# Where the Azkaban web server is located
azkaban.webserver.url=http://190.176.35.102:8081
executor.port=12321
# mail settings
mail.sender=
mail.host=
# User facing web server configurations used to construct the user facing server URLs. They are useful when there is a reverse proxy between Azkaban web servers and users.
# enduser -> myazkabanhost:443 -> proxy -> localhost:8081
# when this parameters set then these parameters are used to generate email links.
# if these parameters are not set then jetty.hostname, and jetty.port(if ssl configured jetty.ssl.port) are used.
# azkaban.webserver.external_hostname=myazkabanhost.com
# azkaban.webserver.external_ssl_port=443
# azkaban.webserver.external_port=8081
job.failure.email=
job.success.email=
lockdown.create.projects=false
cache.directory=cache
# JMX stats
jetty.connector.stats=true
executor.connector.stats=true
# Azkaban plugin settings
azkaban.jobtype.plugin.dir=plugins/jobtypes
# Azkaban mysql settings by default. Users should configure their own username and password.
database.type=mysql
mysql.port=3306
mysql.host=190.176.32.78
mysql.database=azkaban
mysql.user=azkaban
mysql.password=sibat706
mysql.numconnections=100
# Azkaban Executor settings
executor.maxThreads=50
executor.flow.threads=30
配置 Web Server
编辑azkaban.properties
主要修改的配置为(时区,数据库MySQL):
- default.timezone.id=Asia/Shanghai
- jetty.port=8081
- database.type=mysql
- mysql.port=3306
- mysql.host=190.176.32.78
- mysql.database=azkaban
- mysql.user=azkaban
- mysql.password=sibat706
vim /azkaban/azkaban-web-server-3.84.4
# Azkaban Personalization Settings
azkaban.name=Test
azkaban.label=My Local Azkaban
azkaban.color=#FF3601
azkaban.default.servlet.path=/index
web.resource.dir=web/
default.timezone.id=Asia/Shanghai
# Azkaban UserManager class
user.manager.class=azkaban.user.XmlUserManager
user.manager.xml.file=conf/azkaban-users.xml
# Loader for projects
executor.global.properties=conf/global.properties
azkaban.project.dir=projects
# Velocity dev mode
velocity.dev.mode=false
# Azkaban Jetty server properties.
jetty.use.ssl=false
jetty.maxThreads=25
jetty.port=8081
# Azkaban Executor settings
# mail settings
mail.sender=
mail.host=
# User facing web server configurations used to construct the user facing server URLs. They are useful when there is a reverse proxy between Azkaban web servers and users.
# enduser -> myazkabanhost:443 -> proxy -> localhost:8081
# when this parameters set then these parameters are used to generate email links.
# if these parameters are not set then jetty.hostname, and jetty.port(if ssl configured jetty.ssl.port) are used.
# azkaban.webserver.external_hostname=myazkabanhost.com
# azkaban.webserver.external_ssl_port=443
# azkaban.webserver.external_port=8081
job.failure.email=
job.success.email=
lockdown.create.projects=false
cache.directory=cache
# JMX stats
jetty.connector.stats=true
executor.connector.stats=true
# Azkaban mysql settings by default. Users should configure their own username and password.
database.type=mysql
mysql.port=3306
mysql.host=190.176.32.78
mysql.database=azkaban
mysql.user=azkaban
mysql.password=sibat706
mysql.numconnections=100
#Multiple Executor
azkaban.use.multiple.executors=true
#azkaban.executorselector.filters=StaticRemainingFlowSize,MinimumFreeMemory,CpuStatus
azkaban.executorselector.filters=StaticRemainingFlowSize,CpuStatus
azkaban.executorselector.comparator.NumberOfAssignedFlowComparator=1
azkaban.executorselector.comparator.Memory=1
azkaban.executorselector.comparator.LastDispatched=1
azkaban.executorselector.comparator.CpuUsage=1
web服务器用户配置
在azkaban web服务器安装目录 conf目录,按照如下配置修改azkaban-users.xml 文件,增加管理员用户。
$ vim azkaban-users.xml
<azkaban-users>
<user groups="azkaban" password="azkaban" roles="admin" username="azkaban"/>
<user password="metrics" roles="metrics" username="metrics"/>
<user password="sibat706" roles="admin" username="admin"/>
<role name="admin" permissions="ADMIN"/>
<role name="metrics" permissions="METRICS"/>
</azkaban-users>
启动executor服务器
在executor服务器目录下执行启动命令
$ pwd
/share/apps/azkaban/azkaban-exec-server-3.84.4
$ bin/azkaban-executor-start.sh
紧接这激活executor,分别在Bus-HMgr-01、Bus-HMgr-02、Bus-HData-01上面执行如下命令:
$ curl -G “Bus-HMgr-01:12321/executor?action=activate” && echo
$ curl -G “Bus-HMgr-02:12321/executor?action=activate” && echo
$ curl -G “Bus-HData-01:12321/executor?action=activate” && echo
可以在MySQL数据库azkaban中看到结果,active变为1,如下图所示:
启动web服务器
在azkaban web服务器Bus-HMgr-01目录下执行启动命令
$ pwd
/share/apps/azkaban/azkaban-web-server-3.84.4
$ bin/azkaban-web-start.sh
注意:
先执行executor,再执行web,避免Web Server会因为找不到执行器启动失败。
jps查看进程
$ jps
3601 AzkabanExecutorServer
5880 Jps
3661 AzkabanWebServer
启动完成后,在浏览器(建议使用谷歌浏览器)中输入https://服务器IP地址:8081,即可访问azkaban服务了。
在登录中输入刚才在azkaban-users.xml文件中新添加的户用名及密码,点击 login。