sqoop抽取例子

最新推荐文章于 2022-03-10 15:30:19 发布

原创最新推荐文章于 2022-03-10 15:30:19 发布 · 701 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#sqoop mysql 示例

shell 同时被 2 个专栏收录

7 篇文章

订阅专栏

hive

4 篇文章

订阅专栏

本文介绍了如何使用sqoop将数据从MySQL导入到Hive，包括创建Hive表、配置jdbc连接、执行导入操作，并提供了导入为textfile、parquet格式及分区表的示例，同时强调了在Hive中使用string类型处理字段的建议。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

##mysql建表
create table dict_dir(
id varchar(10) comment '主键',
name varchar(20) comment '名称',
class varchar(25) comment '分类'
)comment '字典表'

insert into dict_dir values
('1_1','苹果','fruits'),
('1_2','香蕉','fruits'),
('1_3','橘子','fruits'),
('2_1','白菜','vegetables'),
('2_2','萝卜','vegetables'),
('3_1','牛肉','meat'),
('3_2','羊肉','meat');

#############hive中执行
use ods;
create table ods_dict_dir_textfile(
id string comment '主键',
name string comment '名称',
class string comment '分类'
)comment '字典表'
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES('field.delim'='\001', 'serialization.encoding'='utf8')
STORED AS TEXTFILE;

create table ods_dict_dir_parquet(
id string comment '主键',
name string comment '名称',
class string comment '分类'
)comment '字典表'
STORED AS PARQUET;

create table ods_dict_dir_text_partition(
id string comment '主键',
name string comment '名称',
class string comment '分类'
)comment '字典表'
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES('field.delim'='\001', 'serialization.encoding'='utf8')
partitioned by(l_date string )
STORED AS TEXTFILE ;

#######################shell中执行
jdbc_url= "jdbc:mysql://10.200.1.101:3306/mysql_db?zeroDateTimeBehavior=convertToNull\&useSSL=false\&tinyInt1isBit=false"
username=half
password=123456
query_sql="select id,name,class from dict_dir where \$CONDITIONS "
hdfs_dir=/apps/hive3/warehouse/opt.db/ods_dict_dir
hive_db_name=ods
hive_table_textfile=ods_dict_dir_textfile
hive_table_parquet=ods_dict_dir_parquet
hive_table_text_partition=ods_dict_dir_text_partition
source_tab_name=dict_dir

####### mysql，hive表为parquet格式

sqoop import \
--connect $jdbc_url \
--username $user_name \
--password $password \
--query $query_sql \
--target-dir $hdfs_dir \
--null-string '\\N' --null-non-string '\\N' \
--hive-import --hive-database $hive_db_name --hive-table $hive_table_textfile \
--hive-overwrite \
--as-parquetfile \
--mapreduce-job-name $hive_table_textfile \
--m 1 && rm -rf QueryResult.java

####### mysql，hive表为textfile格式,
sqoop import \
--connect $jdbc_url \
--username $user_name \
--password $password \
--query $query_sql \
--as-textfile \
--target-dir $hdfs_dir \
--fields-terminated-by \\001 \
--hive-delims-replacement ',' \
--delete-target-dir \
--null-string '\\N' --null-non-string '\\N' \
--hive-import --hive-database $hive_db_name --hive-table $hive_table_parquet \
--m 1 --mapreduce-job-name sqoop_$hive_table_parquet &&rm -rf QueryResult.java

####### mysql，hive表为textfile格式,分区
sqoop import
--connect $jdbc_url \
--username $username \
--password $password \
--table $source_tab_name \
--m 1 \
--null-string '\\N' \
--null-non-string '\\N' \
--hive-overwrite \
--fields-terminated-by '\t' \
--hive-delims-replacement ',' \
--hive-import \
--hive-table $hive_db_name.$hive_table_text_partition \
--hive-partition-key l_date \
--hive-partition-value $v_date \
--mapreduce-job-name sqoop_$hive_table_text_partition

注：
1、 where \$CONDITIONS 为固定写法，后边可以加条件
2、脚本后的\为承接下一行命令，\前边建议1空格，后必须不带空格。
3、不同数据库,url不同。有的可以加"",有的'',有的什么都不加。注意自己尝试。
4、如遇不懂,不懂。使用sqoop help ,查看说明。(英文看不懂的，使用翻译软件)
5、在hive的ods库，接收字段类型建议采用string类型。日期类型可能变为11位数字，bigint类型在表关联时可能会被截取查询。
常用url地址：
mysql_jdbc_url= "jdbc:mysql://10.200.1.101:3306/mysql_db?zeroDateTimeBehavior=convertToNull\&useSSL=false\&tinyInt1isBit=false"
sqlserver_jdbc_url= 'jdbc:sqlserver://10.200.1.102:1433;DatabaseName=huca'
oracle_jdbc_url="jdbc:oracle:thin:@10.200.1.103:1521:TIMAL"
db2_jdbc_url="jdbc:db2://10.200.1.104:60000/dbins"