greenplum数据导入到mysql,HIVE到Greenplum数据导入技术,

HIVE到Greenplum数据导入技术,

1.启动gpfdist服务:相关参数

/usr/local/greenplum-db/bin/gpfdist -d /home/gpadmin/data -p 8787 -l /home/gpadmin/data/interdir/gplog/gpfdist_8787.log

-d:存放外部表的目录

-p:端口号

-l:日志文件

2.gpfdist服务的验证:使用jobs命令,验证结果如下:

ps -ef|grep gpfdist 或者 jobs

[2]+  Running                 gpfdist -d /export/gpdata/gpfdist/ -p 8001 -l /home/gpadmin/gpAdminLogs/gpfdist.log &  (wd: /export/gpdata/gpfdist)

==========================================================================

3.greenplum创建外部表

create external table customer(name varchar(32), age int) location ('gpfdist://192.168.129.108:8787/interdir/dic_data/customer.txt') format 'text' (DELIMITER ',');

[gpadmin@bd129108 dic_data]$ cat customer.txt

wz,29

zhangsan,50

wangwu,19

4.导入内部表:

create table customer_inter(name varchar(32), age int);

insert into customer_inter(name, age)  select name, age from customer;

5. 删除临时外部表

==========================================================================

二、Greenplum中使用外部表访问Hive数据

在Hive中创建表,包括并加载数据

create table gp_test(id int,name string) row format delimited fields terminated by '\001' stored as textfile;

load data local inpath '/tmp/zyl/gp_test.txt' into table gp_test;

在GP中创建外部表,并通过gphdfs协议读取HDFS上的数据文件

create external table gp_test (id int,name text) location ('gphdfs://192.168.129.106:8020/user/hive/warehouse/gp_test') format 'TEXT' (DELIMITER '\001');

查询的事情报错:

tydic=# select * from gp_test;

ERROR:  external table gphdfs protocol command ended with error. SLF4J: Class path contains multiple SLF4J bindings.  (seg4 slice1 192.168.129.120:40000 pid=6030)

DETAIL:

SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.8.3-1.cdh5.8.3.p0.2/jars/avro-tools-1.7.6-cdh5.8.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.8.3-1.cdh5.8.3.p0.2/jars/pig-0.12.0-cdh5.8.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.8.3-1.cdh5.8.3.p0.2/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerB

Command: execute:source $GPHOME/lib//hadoop/hadoop_env.sh;java $GP_JAVA_OPT -classpath $CLASSPATH com.emc.greenplum.gpdb.hdfsconnector.HDFSReader $GP_SEGMENT_ID $GP_SEGMENT_COUNT TEXT cdh4.1-gnet-1.2.0.0 'gphdfs://192.168.129.106:8020/user/hive/warehouse/gp_test/gp_test.txt' '000000002300044000000002500044' 'id,name,'

External table gp_test, file gphdfs://192.168.129.106:8020/user/hive/warehouse/gp_test/gp_test.txt

解决方法:配置文件,配置中加了一个非集群中的一个机器导致读取hdfs出问题

建立内部表:

create table gp_test_inter(id int,name varchar(32));

insert into gp_test_inter(id, name)  select id, name from gp_test;

测试内部表和外部表:

内部表查询速度明显快于外部表

文献:

http://blog.csdn.net/jiangshouzhuang/article/details/51721884?locationNum=7&fps=1

http://www.htsjk.com/teradata/37107.html

www.htsjk.Com

true

http://www.htsjk.com/teradata/37107.html

NewsArticle

HIVE到Greenplum数据导入技术, 1.启动gpfdist服务:相关参数 /usr/local/greenplum-db/bin/gpfdist -d /home/gpadmin/data -p 8787 -l /home/gpadmin/data/interdir/gplog/gpfdist_8787.log   -d:存放外部表的目录 -p:端口号 -...

本站文章为和通数据库网友分享或者投稿,欢迎任何形式的转载,但请务必注明出处.

同时文章内容如有侵犯了您的权益,请联系QQ:970679559,我们会在尽快处理。

相关文章

暂无相关文章

Greenplum 集群之间同步数据方法及性能.....................................................................................1 1 概述.................................................................................................................................................2 2 相同集群相同数据库不同 SCHEMA 之间同步数据....................................................................2 2.1 查看原始表的大小行数与结构......................................................................................... 2 2.2 同步语句..............................................................................................................................2 2.3 查看 cpu 与内存的使用情况..............................................................................................3 2.3.1 查看 Master CPU 与内存使用情况.........................................................................3 2.3.2 查看数据节点的 CPU 使用情况..............................................................................3 2.3.3 查看数据节点的内存与磁盘使用情况..................................................................6 2.4 查看耗时与表的大小..........................................................................................................7 3 相同集群不同数据库之间同步数据.............................................................................................8 3.1 使用 pg_dum 方式同步数据..............................................................................................8 3.1.1 查看原始表的信息...................................................................................................8 3.1.2 把数据下载到磁盘...................................................................................................9 3.1.2.1 下载数据语句............................................................................................... 9 3.1.2.2 查看 Master 节点详细信息.........................................................................9 3.1.3 把数据导入到数据库中........................................................................................ 10 3.1.3.1 导入数据库语句......................................................................................... 10 3.1.3.2 查看 Master 节点的详细信息...................................................................11 3.1.3.3 查看数据节点的详细信息.........................................................................11 3.1.4 验证数据的准确性................................................................................................ 13 3.1.5 pg_dump 同步数据总结........................................................................................13 3.2 使用 dblink 同步数据........................................................................................................13 3.2.1 dblink 同步相同集群不同数据库的数据.............................................................13 3.2.1.1 同步语句......................................................................................................13 3.2.1.2 查看 Master 节点的详细信息...................................................................15 3.2.1.3 查看数据节点的详细信息.........................................................................15 3.2.2 验证数据的准确性................................................................................................ 17 3.2.3 dblink 使用总结......................................................................................................17 3.3 使用 gptransfer 同步数据.................................................................................................17 3.3.1 gptransfer 介绍.......................................................................................................17 3.3.2 gptransfer 命令参数介绍.......................................................................................18 3.3.3 查看集群及硬件信息............................................................................................ 19 3.3.4 同集群之间同步数据............................................................................................ 19 3.3.4.1 查看表的详细信息..................................................................................... 19 3.3.4.2 进行表数据同步......................................................................................... 19 3.3.4.3 查看硬件详细信息..................................................................................... 21 3.3.4.4 查看数据的准确性..................................................................................... 23 3.3.4.5 使用 gptransfer 总结...................................................................................23 3.3.5 不同集群之间同步数据........................................................................................ 24
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值