DataX 的使用

一、DataX 的部署

1、上传 datax 压缩包并解压

tar -zxvf datax.tar.gz -C /usr/local/soft/

2、自检,执行命令(在datax目录下)

[root@master datax]# python ./bin/datax.py ./job/job.json

安装成功

二、DataX 的使用

MySQL写入MySQL 

1、生成模板命令

[root@master datax]# python ./bin/datax.py -r mysqlreader -w mysqlwriter

2、根据读写的数据源,获取json模板;可根据官网修改 json 完成数据的同步 

{
	"job": {
		"content": [{
			"reader": {
				"name": "mysqlreader",
				"parameter": {
					"column": [
						"id",
						"name",
						"age",
						"gender",
						"clazz",
						"last_mod"
					],
					"connection": [{
						"jdbcUrl": ["jdbc:mysql://master:3306/student"],
						"table": ["student"]
					}],
					"password": "123456",
					"username": "root"
				}
			},
			"writer": {
				"name": "mysqlwriter",
				"parameter": {
					"column": [
						"id",
						"name",
						"age",
						"gender",
						"clazz",
						"last_mod"
					],
					"connection": [{
						"jdbcUrl": "jdbc:mysql://master:3306/student2?useUnicode=true&characterEncoding=utf8",
						"table": ["student2"]
					}],
					"preSql": [
						"truncate table student2"
					],
					"password": "123456",
					"username": "root",
					"writeMode": "insert"
				}
			}
		}],
		"setting": {
			"speed": {
				"channel": "5"
			}
		}
	}
}

3、执行

[root@master dataxjsons]# datax.py mysql2mysql.json

MySQL写入HDFS 

 1、生成模板

[root@master dataxjsons]# python /usr/local/soft/datax/bin/datax.py -r mysqlreader -w hdfswriter

2、修改

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "mysqlreader", 
                    "parameter": {
                        "column": ["*"], 
                        "connection": [
                            {
                                "jdbcUrl": ["jdbc:mysql://master:3306/student"], 
                                "table": ["student"]
                            }
                        ], 
                        "password": "123456", 
                        "username": "root"
                    }
                }, 
                "writer": {
                    "name": "hdfswriter", 
                    "parameter": {
                        "column": [
                        {
                                "name": "col1",
                                "type": "int"
                            },
                            {
                                "name": "col2",
                                "type": "String"
                            },
                            {
                                "name": "col3",
                                "type": "int"
                            },
                            {
                                "name": "col4",
                                "type": "String"
                            },
                            {
                                "name": "col5",
                                "type": "String"
                            },
                            {
                                "name": "col6",
                                "type": "Date"
                            }
                        ],  
                        "defaultFS": "hdfs://master:9000", 
                        "fieldDelimiter": ",", 
                        "fileName": "msql2hdfs", 
                        "fileType": "text", 
                        "path": "/shujia/bigdata17/datax/", 
                        "writeMode": "append"
                    }
                }
            }
        ], 
        "setting": {
            "speed": {
                "channel": "1"
            }
        }
    }
}

3、执行

[root@master dataxjsons]# datax.py mysql2hdfs.json

MySQL同步数据到Hive 

1、hive建表

CREATE EXTERNAL TABLE IF NOT EXISTS student2(
    id BIGINT,
    name STRING,
    age INT,
    gender STRING,
    clazz STRING,
    last_mod STRING
)
comment '学生表'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE 

 2、生成模板

[root@master dataxjsons]# python /usr/local/soft/datax/bin/datax.py -r mysqlreader -w hdfswriter

 3、修改

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "mysqlreader", 
                    "parameter": {
                        "column": ["*"], 
                        "connection": [
                            {
                                "jdbcUrl": ["jdbc:mysql://master:3306/student"], 
                                "table": ["student"]
                            }
                        ], 
                        "password": "123456", 
                        "username": "root"
                    }
                }, 
                "writer": {
                    "name": "hdfswriter", 
                    "parameter": {
                        "column": [
                        {
                                "name": "id",
                                "type": "bigint"
                            },
                            {
                                "name": "name",
                                "type": "string"
                            },
                            {
                                "name": "age",
                                "type": "INT"
                            },
                            {
                                "name": "gender",
                                "type": "string"
                            },
                            {
                                "name": "clazz",
                                "type": "string"
                            },
                            {
                                "name": "last_mod",
                                "type": "string"
                            }
                        ], 
                        "defaultFS": "hdfs://master:9000", 
                        "fieldDelimiter": ",", 
                        "fileName": "student2", 
                        "fileType": "text", 
                        "path": "/user/hive/warehouse/bigdata17.db/student2/", 
                        "writeMode": "append"
                    }
                }
            }
        ], 
        "setting": {
            "speed": {
                "channel": "1"
            }
        }
    }
}

4、执行

[root@master dataxjsons]# datax.py mysql2hive.json

向Hive中同步数据,即向HDFS上Hive表目录下同步数据 

增量同步,可在where后添加条件("where": "id > 7")

Mysql向HBase同步数据 

1、hbase创建表

hbase(main):003:0> create 'datastudent','info' 

 2、生成模板

[root@master dataxjsons]# python /usr/local/soft/datax/bin/datax.py -r mysqlreader -w hbase11xwriter

3、修改

{
	"job": {
		"content": [{
			"reader": {
				"name": "mysqlreader",
				"parameter": {
					"column": [
						"id",
						"name",
						"age",
						"gender",
						"clazz",
						"last_mod"
					],
					"connection": [{
						"jdbcUrl": ["jdbc:mysql://master:3306/student"],
						"table": ["student"]
					}],
					"password": "123456",
					"username": "root"
				}
			},
			"writer": {
				"name": "hbase11xwriter",
				"parameter": {
					"column": [{
							"index": 1,
							"name": "info:name",
							"type": "string"
						},
						{
							"index": 2,
							"name": "info:age",
							"type": "int"
						},
						{
							"index": 3,
							"name": "info:gender",
							"type": "string"
						},
						{
							"index": 5,
							"name": "info:last_mod",
							"type": "string"
						}
					],
					"encoding": "utf-8",
					"hbaseConfig": {
						"hbase.zookeeper.quorum": "master:2181,node1:2181,node2:2181"
					},
					"mode": "normal",
					"rowkeyColumn": [{
							"index": 0,
							"type": "string"
						},
						{
							"index": -1,
							"type": "string",
							"value": "_"
						},
						{
							"index": 4,
							"type": "string"
						}
					],
					"table": "datastudent"
				}
			}
		}],
		"setting": {
			"speed": {
				"channel": "5"
			}
		}
	}
}

4、执行

[root@master dataxjsons]# datax.py mysql2hbase.json

HBase同步数据到MySQL 

1、生成模板

python /usr/local/soft/datax/bin/datax.py -r hbase11xreader -w mysqlwriter

2、修改

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "hbase11xreader", 
                    "parameter": {
                        "column": [
                        {
                                "name": "rowkey",
                                "type": "string"
                            },
                            {
                                "name": "info: name",
                                "type": "string"
                            },
                            {
                                "name": "info: age",
                                "type": "int"
                            },
                            {
                                "name": "info: gender",
                                "type": "string"
                            },
                            {
                                "name": "info: last_mod",
                                "type": "string"
                            },
                        ], 
                        "encoding": "utf-8", 
                        "hbaseConfig": {
                            "hbase.zookeeper.quorum": "master:2181,node1:2181,node2:2181"
                        }, 
                        "mode": "normal", 
                        "table": "datastudent"
                    }
                }, 
                "writer": {
                    "name": "mysqlwriter", 
                    "parameter": {
                        "column": [
                        "id",
                        "name",
                        "age",
                        "gender",
                        "last_mod"
                        ], 
                        "connection": [
                            {
                                "jdbcUrl": "jdbc:mysql://master:3306/student?useUnicode=true&characterEncoding=utf8", 
                                "table": ["student_copy1"]
                            }
                        ], 
                        "password": "123456", 
                        "username": "root", 
                        "writeMode": "append"
                    }
                }
            }
        ], 
        "setting": {
            "speed": {
                "channel": "5"
            }
        }
    }
}

3、执行

[root@master dataxjsons]# datax.py hbase2mysql.json

可在MySQL中查看从hbase同步过来的数据 

<think>嗯,用户再次明确了需求:需要DataX使用指南和操作说明,特别提到了部署与使用教程。结合之前的对话历史,我已经提供了安装部署的完整流程,现在需要聚焦在实际操作层面。用户可能是个刚接触DataX的数据工程师或运维人员,ta的真实需求可能不止基础操作,更想了解如何应对实际生产环境中的问题。从ta两次提问的连贯性看,应该是在搭建数据同步平台,可能需要从MySQL这类关系型数据库同步到大数据平台。注意到引用内容里提到了DataX的核心架构和模块分工,特别是datax-core、datax-job、datax-web这些组件[^3]。这些信息很重要,因为用户如果要做企业级部署,可能需要web管理界面而非命令行操作。需要重点展开的部分:1任务配置示例必须实用,比如MySQL到Hive的同步配置要包含分页参数和字段映射2性能调优部分要给出具体参数修改示例,像channel/chunkSize的取值逻辑3错误处理要直击痛点,比如网络中断后的续传方案对了,用户上次提到的DataX与实时工具对比问题其实很有价值,这次可以放在相关问题里引申,这对技术选型很有帮助。另外考虑到用户环境,所有示例都要强调中文编码的处理技巧。最后组织信息时得注意:操作指南必须像烹饪步骤一样明确。比如创建任务目录这种细节,新手可能因目录权限问题卡住,得提醒chmod命令的使用。</think> 以下是DataX的完整使用指南及操作教程,结合官方文档和最佳实践: --- ### 一、部署验证(接续安装流程) 1. **环境检查** ```bash # 确认Python版本 python -V # 需为2.7.x # 验证Java环境 echo $JAVA_HOME && java -version ``` 2. **运行内置测试** ```bash cd /opt/datax/bin python datax.py ../job/job.json # 执行内存流同步测试 ``` *成功输出示例*: ```plaintext Total 100000 records, 2600000 bytes | Average 1.38 MB/s Task end status: SUCCESS [^1] ``` --- ### 二、核心操作指南 #### 1. **基础任务配置** 模板示例(MySQL → MySQL): ```json { "job": { "content": [{ "reader": { "name": "mysqlreader", "parameter": { "username": "root", "password": "******", "connection": [{ "jdbcUrl": ["jdbc:mysql://src_host:3306/db1"], "table": ["users"] }], "column": ["id","name","email"], "where": "create_time > '2023-01-01'" } }, "writer": { "name": "mysqlwriter", "parameter": { "username": "admin", "password": "******", "connection": [{ "jdbcUrl": "jdbc:mysql://dest_host:3306/db2", "table": ["user_backup"] }], "column": ["id","full_name","contact"] } } }], "setting": { "speed": { "channel": 4 // 并发通道数 } } } } ``` #### 2. **执行同步任务** ```bash python datax.py /path/to/your_job.json ``` #### 3. **高级参数配置 | 配置项 | 作用 | 示例值 | |------------------|-------------------------------|---------------| | errorLimit | 允许的最大错误记录数 | {"record": 5} | | speed.byte | 流量限制(字节/秒) | 1048576 | | dryRun | 预检模式(不执行写入) | true | --- ### 三、生产级实践 #### 1. **性能优化技巧** - **增加并发**:修改`channel`数(建议=源数据库连接池上限) - **分批读取**:添加`splitPk`实现分片并行 ```json "reader": { "parameter": { "splitPk": "id", // 按主键分片 "querySql": ["SELECT * FROM orders WHERE id BETWEEN $1 AND $2"] } } ``` #### 2. **断点续传方案** 1. 启用`isResume`标志 2. 配置断点记录目录 ```json "core": { "transport": { "resume": { "isResume": true, "path": "/data/datax/resume_logs/" } } } ``` #### 3. **错误诊断方法 查看详细日志: ```bash tail -f /opt/datax/log/2023-xx-xx_xx.log ``` *常见错误处理*: ```plaintext Connection timed out → 增加数据库超时参数 ERROR Code:[DBUtilErrorCode-10] → JDBC驱动版本不匹配 ``` --- ### 四、扩展应用 #### 1. 集成DataX-Web管理平台 ```bash # 基于引用[3]部署管理界面 git clone https://github.com/WeiYe-Jing/datax-web cd datax-web && mvn package ``` *功能优势*: - 可视化任务配置 - 实时监控同步流量 - 自动生成执行报告[^3] #### 2. 自定义插件开发 1. 实现`Reader`/`Writer`接口 2. 打包放入`plugin`目录 3. 注册至`plugin.json`配置文件 --- > **关键说明** > - 中文路径支持:在`datax.py`添加`JAVA_OPTS="-Dfile.encoding=UTF-8"` > - 内存调整:修改`bin/datax.py`中的`-Xms4096m -Xmx8192m`[^2] > - 插件库更新:定期从[DataX GitHub](https://github.com/alibaba/datax)获取新插件
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值