SlideShare a Scribd company logo
Page1
APACHE DRILL WITH ORACLE, HIVE AND HBASE
Prepared By: Nag Arvind Gudiseva
PROBLEM STATEMENT
Create a data pipelineby analysing data frommultipledata sources and persista JSON document.
ARCHITECTURAL SOLUTION
Use Apache Drill StoragePlugins to connect to RDBMS (MySQL, Oracle,etc.), NoSQL databases (MongoDB, Hive, HBase, etc.) and
text documents (JSON, CSV, etc.). Analysethe data in the tables (or dynamic schema on the fly for text documents) and leverage
Apache Drill API to combine data from different tables (or text documents) from different data sources on the fly. Apache Drill
exposes a REST WebService, which can be consumed usingJava Jersey REST Clientprogram. One can call thePOST method and
submitDrill Queries as a requestobject and receive the response in a JSON format, which can then be persisted on the Local File
System.
PICTORIAL ILLUSTRATION
Page2
INSTALLATION STEPS ON UBUNTU 14.04 VM
1. Download Apache Drill usingwget command
wget http://mirror.symnds.com/software/Apache/drill/drill-1.4.0/apache-drill-1.4.0.tar.gz
2. Untar and extract
tar -xvzf apache-drill-1.4.0.tar.gz
3. Move the folder to a preferred location
sudo mv apache-drill-1.4.0 /usr/local/apache-drill
4. Install Zookeeper:
a. Download the stableversion (zookeeper-3.4.6.tar.gz) from http://hadoop.apache.org/zookeeper/releases.html
b. Untar and move the folder to a preferred location
c. Rename zoo_sample.cfg as zoo.cfg
STARTING DRILL
a. EMBEDDED MODE (with SqlLine)
<DRILL_HOME>/bin/sqlline -u jdbc:drill:zk=local
(OR)
./bin/drill-embedded
b. DISTRIBUTED MODE (Start ZooKeeper and Drill Bit)
<ZOOKEEPER_HOME>/bin/zkServer.sh start
< ZOOKEEPER _HOME>/bin/zkServer.sh status
(AND)
<DRILL_HOME>/bin/drillbit.sh start
<DRILL_HOME>/bin/drillbit.sh status
STOPPING DRILL (AND ZOOKEEPER)
a. EMBEDDED MODE (with SqlLine)
0: jdbc:drill:zk=local>!quit
Page3
b. DISTRIBUTED MODE (Stop Drill Bitand ZooKeeper)
<DRILL_HOME>/bin/drillbit.sh stop
<DRILL_HOME>/bin/drillbit.sh status
(AND)
< ZOOKEEPER _HOME>/bin/zkServer.sh stop
< ZOOKEEPER _HOME>/bin/zkServer.sh status
JAR DEFAULT QUERIES
REFERENCE: <DRILL_HOME>/jars/3rdparty/foodmart-data-json-0.4.jar
0: jdbc:drill:zk=local>showdatabases;
0: jdbc:drill:zk=local>selectemployee_id, first_name,last_name,position_id,salary FROMcp.`employee.json` where salary >
30000;
0: jdbc:drill:zk=local>selectemployee_id, first_name,last_name,position_id,salary FROMcp.`employee.json` where salary >
30000 and position_id=2;
0: jdbc:drill:zk=local>selectemp.employee_id, emp.first_name,emp.salary,emp.department_id FROM cp.`employee.json` emp
where emp.salary <40000 and emp.salary>21000;
0: jdbc:drill:zk=local>selectemp.employee_id, emp.first_name,emp.salary,emp.department_id,dept.department_description
FROM cp.`employee.json` emp , cp.`department.json` dept where emp.salary <40000 and emp.salary>21000 and
emp.department_id = dept.department_id;
JSON SAMPLE QUERIES
SELECT * from dfs.`/home/gudiseva/arvind/zips.json` LIMIT10;
CSV SAMPLE QUERIES
select * FROM dfs.`/home/gudiseva/arvind/sample.csv`;
select columns[0] as id,columns[1] name, columns[2] as weight, columns[3] as height FROM
dfs.`/home/gudiseva/arvind/sample.csv`;
CREATING VIEW BY QUERYING MULTIPLE DATA SOURCES
CREATE or REPLACE view dfs.tmp.MULTI_VIEW as selectemp.employee_id, phy.columns[1] as Name
,dept.department_description,phy.columns[2] as Weight, phy.columns[3] as Height FROM cp.`employee.json` emp ,
cp.`department.json` dept, dfs.`/home/gudiseva/arvind/sample.csv` phy where CAST(emp.employee_id AS INT) =
CAST(phy.columns[0] AS INT) and emp.department_id = dept.department_id;
SELECT * FROM dfs.tmp.MULTI_VIEW;
Page4
OPTIMIZATION CONFIGURATIONS
<DRILL_HOME>/conf/drill-env.sh
DRILL_MAX_DIRECT_MEMORY="1G"
DRILL_HEAP="512M"
STORAGE PLUGINS
MYSQL
{
"type": "jdbc",
"driver": "com.mysql.jdbc.Driver",
"url": "jdbc:mysql://localhost",
"username": "root",
"password":"root",
"enabled": true
}
select * from mysql.userdb.`employee`;
ORACLE
{
"type": "jdbc",
"driver": "oracle.jdbc.OracleDriver",
"url": "jdbc:oracle:thin:MY_APPL/mayura_123@dbs-nprd2-vm-004.mayura.com:1523/N2S004I",
"username": "MY_APPL",
"password":"mayura_123",
"enabled": true
}
select * from oracle.MY_APPL.`emp`;
HIVE
{
"type": "hive",
"enabled": false,
"configProps":{
"hive.metastore.uris": "thrift://localhost:10000",
"javax.jdo.option.ConnectionURL": "jdbc:derby://localhost:1527/metastore_db;create=true",
"hive.metastore.warehouse.dir": "/user/hive/warehouse",
Page5
"fs.default.name": "file:///",
"hive.metastore.sasl.enabled": "false"
}
}
select * from hive.arvind.`employee`;
NOTE:
HIVE SERVER should be started
$ hive --servicehiveserver --verbose
[hive shell will not work when Hive Server is started]
MONGODB
{
"type": "mongo",
"connection": "mongodb://first_name:last_name@ds048537.mongolab.com:48537/m101",
"enabled": true
}
select `_id`, `value` from mongo.m101.`storm`;
HBASE
{
"type": "hbase",
"config": {
"hbase.zookeeper.quorum": "localhost",
"hbase.zookeeper.property.clientPort": "2181"
},
"size.calculator.enabled":false,
"enabled": true
}
SELECT CONVERT_FROM(row_key, 'UTF8') AS empid,
CONVERT_FROM(emp.personal_data.city, 'UTF8') AS city,
CONVERT_FROM(emp.personal_data.name, 'UTF8') AS name,
CONVERT_FROM(emp.professional_data.designation, 'UTF8') AS designation,
CONVERT_FROM(emp.professional_data.salary,'UTF8') AS salary
FROM hbase.`emp`;
Page6
RELOAD .BASHRC:
source~/.bashrc
(OR)
. ~/.bashrc
HBASE SAMPLE QUERIES
select * from hbase.`emp`;
SELECT CONVERT_FROM(row_key, 'UTF8') AS empid FROM hbase.`emp`;
SELECT CONVERT_FROM(row_key, 'UTF8') AS empid, CONVERT_FROM(emp.personal_data.city, 'UTF8') AS city FROM
hbase.`emp`;
SELECT CONVERT_FROM(emp.personal_data.city, 'UTF8') AS city, CONVERT_FROM(emp.personal_data.name, 'UTF8') As name
FROM hbase.`emp`;
SELECT CONVERT_FROM(row_key, 'UTF8') AS empid, CONVERT_FROM(emp.p ersonal_data.city, 'UTF8') AS city,
CONVERT_FROM(emp.personal_data.name, 'UTF8') As name, CONVERT_FROM(emp.professional_data.designation, 'UTF8') AS
designation,CONVERT_FROM(emp.professional_data.salary,'UTF8') As salary FROMhbase.`emp`;
ORACLE, HIVE AND HBASE (UNION ALL) QUERIES
select id,name, salary frommysql.userdb.`employee` union all selectid,first,salary fromoracle.MY_APPL.`emp`;
SELECT EID AS ID, NAME AS NAME, SALARY AS SALARY FROM hive.arvind.`employee` WHERE DESTINATION LIKE '%manager%'
UNION ALL
SELECT CONVERT_FROM(row_key, 'UTF8') AS ID, CONVERT_FROM(emp.personal_data.name, 'UTF8') AS NAME,
CONVERT_TO(emp.professional_data.salary,'UTF8') AS SALARY FROM hbase.`emp` WHERE
CONVERT_FROM(emp.professional_data.designation, 'UTF8') LIKE '%manager%';
SELECT EID AS ID, NAME AS NAME, TO_NUMBER(SALARY, '######') AS SALARY FROM hive.arvind.`employee` WHERE
DESTINATION LIKE '%manager%'
UNION ALL
SELECT ID AS ID, FIRST AS NAME, SALARY AS SALARY FROM oracle.MY_APPL.`emp`
UNION ALL
SELECT CONVERT_FROM(row_key, 'UTF8') AS ID, CONVERT_FROM(emp.personal_data.name, 'UTF8') AS NAME,
TO_NUMBER(emp.professional_data.salary,'######') AS SALARY FROM hbase.`emp` WHERE
CONVERT_FROM(emp.professional_data.designation, 'UTF8') LIKE '%manager%';
Page7
DRILL REST WEBSERVICE INTERFACE
1. Install RESTClientExtension in Firefox
2. On Firefox Browser, open the REST Client: chrome://restclient/content/restclient.html
3. Set the Request object:
Method: POST
URL: http://localhost:8047/query.json
Header: Content-Type: application/json
Body:
{
"queryType" : "SQL",
"query": "SELECT * from dfs.`/home/gudiseva/arvind/zips.json` LIMIT10"
}
4. Response object received:
Response Headers:
Status Code: 200 OK
Content-Length: 1377
Content-Type: application/json
Server: Jetty(9.1.5.v20140505)
Response Body:
{
"columns": [ "_id", "city", "loc", "pop", "state" ],
"rows" : [ {
"_id" : "01001",
"state" : "MA",
"loc" : "[-72.622739,42.070206]",
"pop" : "15338",
"city" : "AGAWAM"
}, {
"_id" : "01002",
"state" : "MA",
"loc" : "[-72.51565,42.377017]",
"pop" : "36963",
"city" : "CUSHMAN"
} ]
}
Page8
FIGURE ILLUSTRATION OF REST UI CLIENT
JAVA SAMPLE PROGRAMS
DRILL JDBC API (WITH DRILL JDBC DRIVER)
package nag.arvind.gudiseva;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
public class DrillHiveOracleHBase {
public staticvoid main(String[]args) {
try {
Page9
Class.forName("org.apache.drill.jdbc.Driver");
Connection conn=DriverManager.getConnection("jdbc:drill:zk=localhost:2181","","");
Statementstmt =conn.createStatement();
String sql ="SELECTEID AS ID, NAMEAS NAME, TO_NUMBER(SALARY, '######') AS SALARY FROMhive.arvind.`employee`
WHERE DESTINATION LIKE'%manager%'"+
"UNION ALL "+
"SELECTID AS ID, FIRSTAS NAME, SALARY AS SALARY FROM
oracle.MY_APPL.`emp` "+
"UNION ALL "+
"SELECTCONVERT_FROM(row_key, 'UTF8') AS ID,
CONVERT_FROM(emp.personal_data.name, 'UTF8') AS NAME, TO_NUMBER(emp.professional_data.salary, '######') AS SALARY FROM hbase.`emp` WHERE
CONVERT_FROM(emp.professional_data.designation,'UTF8') LIKE'%manager%'";
ResultSet rs =stmt.executeQuery(sql);
System.out.println("ID"+"t"+ "NAME " +"t" + "SALARY ");
System.out.println("__"+"t" + "____ " + "t"+ "______ ");
System.out.println("--"+"t" + "---- " +"t" + "------ ");
while(rs.next()) {
String id =rs.getString("ID");
String name=rs.getString("NAME");
int salary =rs.getInt("SALARY");
System.out.println(id+"t"+name +"t"+salary);
}
rs.close();
stmt.close();
conn.close();
} catch (ClassNotFoundExceptione) {
e.printStackTrace();
} catch (SQLException e) {
e.printStackTrace();
}
}
}
DRILL JDBC JARS:
drill-jdbc-all-1.4.0.jar
DRILL REST WEBSERVICE (WITH JERSEY REST CLIENT)
package web.service.rest;
import java.io.FileWriter;
import java.io.IOException;
Page10
import com.sun.jersey.api.client.Client;
import com.sun.jersey.api.client.ClientResponse;
import com.sun.jersey.api.client.WebResource;
import com.sun.jersey.api.client.WebResource.Builder;
public class JerseyClientPost{
public staticvoid main(String[]args) {
try {
Client client= Client.create();
WebResourcewebResource=client.resource("http://localhost:8047/query.json");
String input ="{"
+ ""queryType": "SQL""
+ ","
+ ""query": "SELECT* from dfs.`/home/gudiseva/arvind/zips.json` LIMIT10""
+ "}";
Builder builder =webResource.type("application/json");
ClientResponse response =builder.post(ClientResponse.class, input);
if(response.getStatus() !=200) {
throw new RuntimeException("Failed: HTTP error code: "+response.getStatus());
}
System.out.println("Outputfrom Server .... n");
String output=response.getEntity(String.class);
System.out.println(output);
try {
FileWriterfile=new FileWriter("/tmp/JSON/aix_input.json");
file.write(output);
file.flush();
file.close();
} catch (IOException e) {
e.printStackTrace();
}
} catch (Exception e) {
e.printStackTrace();
}
}
Page11
}
JERSEY JARS DOWNLOAD:
https://sites.google.com/a/dineshonjava.com/dineshonjava/dineshonjava/RESTClient.zip?attredirects=0&d=1
jersey-bundle-1.14.jar
REFERENCES
http://ranafaisal.info/2015/05/13/install-apache-drill-on-ubuntu-14-04/
http://www.devinline.com/2015/11/apache-drill-setup-and-SQL-query-execution.html
http://www.devinline.com/2015/11/connect-jdbc-client-to-apache-drill.html

More Related Content

What's hot (20)

PDF
Hive Anatomy
nzhang
 
PPTX
Analyzing Real-World Data with Apache Drill
tshiran
 
PPTX
Spark SQL versus Apache Drill: Different Tools with Different Rules
DataWorks Summit/Hadoop Summit
 
PPTX
Apache drill
Jakub Pieprzyk
 
PPTX
Hadoop and Spark for the SAS Developer
DataWorks Summit
 
PPTX
M7 and Apache Drill, Micheal Hausenblas
Modern Data Stack France
 
PDF
An introduction to apache drill presentation
MapR Technologies
 
PDF
Drill into Drill – How Providing Flexibility and Performance is Possible
MapR Technologies
 
PPTX
Understanding the Value and Architecture of Apache Drill
DataWorks Summit
 
PPTX
Hadoop & HDFS for Beginners
Rahul Jain
 
PPTX
Introduction to Apache Drill
Swiss Big Data User Group
 
PDF
Cloudera Impala, updated for v1.0
Scott Leberknight
 
PPTX
Big data, just an introduction to Hadoop and Scripting Languages
Corley S.r.l.
 
PPTX
6.hive
Prashant Gupta
 
PPTX
Introduction to Apache HBase Training
Cloudera, Inc.
 
PPTX
Rethinking SQL for Big Data with Apache Drill
MapR Technologies
 
PPTX
Introduction to Apache HBase, MapR Tables and Security
MapR Technologies
 
PDF
Apache Spark Introduction
sudhakara st
 
PDF
From docker to kubernetes: running Apache Hadoop in a cloud native way
DataWorks Summit
 
Hive Anatomy
nzhang
 
Analyzing Real-World Data with Apache Drill
tshiran
 
Spark SQL versus Apache Drill: Different Tools with Different Rules
DataWorks Summit/Hadoop Summit
 
Apache drill
Jakub Pieprzyk
 
Hadoop and Spark for the SAS Developer
DataWorks Summit
 
M7 and Apache Drill, Micheal Hausenblas
Modern Data Stack France
 
An introduction to apache drill presentation
MapR Technologies
 
Drill into Drill – How Providing Flexibility and Performance is Possible
MapR Technologies
 
Understanding the Value and Architecture of Apache Drill
DataWorks Summit
 
Hadoop & HDFS for Beginners
Rahul Jain
 
Introduction to Apache Drill
Swiss Big Data User Group
 
Cloudera Impala, updated for v1.0
Scott Leberknight
 
Big data, just an introduction to Hadoop and Scripting Languages
Corley S.r.l.
 
Introduction to Apache HBase Training
Cloudera, Inc.
 
Rethinking SQL for Big Data with Apache Drill
MapR Technologies
 
Introduction to Apache HBase, MapR Tables and Security
MapR Technologies
 
Apache Spark Introduction
sudhakara st
 
From docker to kubernetes: running Apache Hadoop in a cloud native way
DataWorks Summit
 

Viewers also liked (20)

PDF
Apache Drill Overview - Tokyo Apache Drill Meetup 2015/09/15
MapR Technologies Japan
 
PPTX
Apache Spark チュートリアル
K Yamaguchi
 
PDF
Apache HBase for Architects
Nick Dimiduk
 
PDF
Apache Spark超入門 (Hadoop / Spark Conference Japan 2016 講演資料)
NTT DATA OSS Professional Services
 
PDF
ちょっと理解に自信がないな という皆さまに贈るHadoop/Sparkのキホン (IBM Datapalooza Tokyo 2016講演資料)
hamaken
 
PDF
Integration of Hive and HBase
Hortonworks
 
PPT
3.2 buterfly life cycle
Cecilia Barber
 
PPT
Decision Support Tool for Retrofitting a District Towards District as a Service
FAST-Lab. Factory Automation Systems and Technologies Laboratory, Tampere University of Technology
 
PPTX
Fracturas
veronica henao garcia
 
PDF
Berteater meningkatkan inteligensi ganda
Solihin Utjok
 
PPTX
Test1
Rajneesh Mahajan
 
PPT
Инна Лысенко. PR-Охота! Маркетинг мест
prasu1995
 
PDF
Опрыскиватель бензиновый Champion PS226
Al Maks
 
PPTX
Jaringan Epitel
Brigitta Jesslyn
 
PPT
Benchmarking Briefing for LinkedIn
John Coulter MBA
 
PPTX
Seattle bestpractices2010
Olaseni Odebiyi
 
PPT
Assessment of IEC-61499 and CDL for Function Block composition in factory-wid...
FAST-Lab. Factory Automation Systems and Technologies Laboratory, Tampere University of Technology
 
PPTX
Kak cozdat blog
Viktoriya Donchik
 
PDF
Magma Investor Presentation Q2 FY12
AtulTibrewal
 
PDF
Газовый настенный котел Baxi Luna 3 310 Fi
Al Maks
 
Apache Drill Overview - Tokyo Apache Drill Meetup 2015/09/15
MapR Technologies Japan
 
Apache Spark チュートリアル
K Yamaguchi
 
Apache HBase for Architects
Nick Dimiduk
 
Apache Spark超入門 (Hadoop / Spark Conference Japan 2016 講演資料)
NTT DATA OSS Professional Services
 
ちょっと理解に自信がないな という皆さまに贈るHadoop/Sparkのキホン (IBM Datapalooza Tokyo 2016講演資料)
hamaken
 
Integration of Hive and HBase
Hortonworks
 
3.2 buterfly life cycle
Cecilia Barber
 
Decision Support Tool for Retrofitting a District Towards District as a Service
FAST-Lab. Factory Automation Systems and Technologies Laboratory, Tampere University of Technology
 
Berteater meningkatkan inteligensi ganda
Solihin Utjok
 
Инна Лысенко. PR-Охота! Маркетинг мест
prasu1995
 
Опрыскиватель бензиновый Champion PS226
Al Maks
 
Jaringan Epitel
Brigitta Jesslyn
 
Benchmarking Briefing for LinkedIn
John Coulter MBA
 
Seattle bestpractices2010
Olaseni Odebiyi
 
Assessment of IEC-61499 and CDL for Function Block composition in factory-wid...
FAST-Lab. Factory Automation Systems and Technologies Laboratory, Tampere University of Technology
 
Kak cozdat blog
Viktoriya Donchik
 
Magma Investor Presentation Q2 FY12
AtulTibrewal
 
Газовый настенный котел Baxi Luna 3 310 Fi
Al Maks
 
Ad

Similar to Apache Drill with Oracle, Hive and HBase (20)

PPTX
Apache Drill at ApacheCon2014
Neeraja Rentachintala
 
PDF
Data Exploration with Apache Drill: Day 1
Charles Givre
 
PPTX
Drilling into Data with Apache Drill
DataWorks Summit
 
PDF
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Charles Givre
 
PDF
Swiss Big Data User Group - Introduction to Apache Drill
MapR Technologies
 
PPTX
Apache drill
MapR Technologies
 
PDF
Building Highly Flexible, High Performance Query Engines
MapR Technologies
 
PDF
2014 08-20-pit-hug
Andy Pernsteiner
 
PDF
Hadoop User Group - Status Apache Drill
MapR Technologies
 
PDF
Apache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
MapR Technologies
 
PDF
Apache Drill talk ApacheCon 2018
Aman Sinha
 
PPTX
Apache Drill
Ted Dunning
 
PPTX
Large scale, interactive ad-hoc queries over different datastores with Apache...
jaxLondonConference
 
PPTX
Berlin Hadoop Get Together Apache Drill
MapR Technologies
 
PPTX
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
MapR Technologies
 
PDF
Apache Drill Workshop
Charles Givre
 
PPTX
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Yahoo Developer Network
 
PPTX
No sql and sql - open analytics summit
Open Analytics
 
PPTX
Hive in Practice
András Fehér
 
PPT
Hive Apachecon 2008
athusoo
 
Apache Drill at ApacheCon2014
Neeraja Rentachintala
 
Data Exploration with Apache Drill: Day 1
Charles Givre
 
Drilling into Data with Apache Drill
DataWorks Summit
 
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Charles Givre
 
Swiss Big Data User Group - Introduction to Apache Drill
MapR Technologies
 
Apache drill
MapR Technologies
 
Building Highly Flexible, High Performance Query Engines
MapR Technologies
 
2014 08-20-pit-hug
Andy Pernsteiner
 
Hadoop User Group - Status Apache Drill
MapR Technologies
 
Apache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
MapR Technologies
 
Apache Drill talk ApacheCon 2018
Aman Sinha
 
Apache Drill
Ted Dunning
 
Large scale, interactive ad-hoc queries over different datastores with Apache...
jaxLondonConference
 
Berlin Hadoop Get Together Apache Drill
MapR Technologies
 
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
MapR Technologies
 
Apache Drill Workshop
Charles Givre
 
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Yahoo Developer Network
 
No sql and sql - open analytics summit
Open Analytics
 
Hive in Practice
András Fehér
 
Hive Apachecon 2008
athusoo
 
Ad

More from Nag Arvind Gudiseva (13)

PDF
Elasticsearch security
Nag Arvind Gudiseva
 
DOCX
Elasticsearch Security Strategy
Nag Arvind Gudiseva
 
DOCX
Git as version control for Analytics project
Nag Arvind Gudiseva
 
DOCX
Exception Handling in Scala
Nag Arvind Gudiseva
 
DOCX
Hive performance optimizations
Nag Arvind Gudiseva
 
DOCX
Creating executable JAR from Eclipse IDE
Nag Arvind Gudiseva
 
DOCX
Adding Idea IntelliJ projects to Subversion Version Control
Nag Arvind Gudiseva
 
DOC
ElasticSearch Hands On
Nag Arvind Gudiseva
 
PDF
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Nag Arvind Gudiseva
 
PPTX
Order Review Solution Application (Version 2.0)
Nag Arvind Gudiseva
 
DOC
MSC Temporary Passwords reset tool
Nag Arvind Gudiseva
 
DOC
Store Support Operations - Training on MSC Application
Nag Arvind Gudiseva
 
PPT
Store Support Operations - Training on MSC Application
Nag Arvind Gudiseva
 
Elasticsearch security
Nag Arvind Gudiseva
 
Elasticsearch Security Strategy
Nag Arvind Gudiseva
 
Git as version control for Analytics project
Nag Arvind Gudiseva
 
Exception Handling in Scala
Nag Arvind Gudiseva
 
Hive performance optimizations
Nag Arvind Gudiseva
 
Creating executable JAR from Eclipse IDE
Nag Arvind Gudiseva
 
Adding Idea IntelliJ projects to Subversion Version Control
Nag Arvind Gudiseva
 
ElasticSearch Hands On
Nag Arvind Gudiseva
 
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Nag Arvind Gudiseva
 
Order Review Solution Application (Version 2.0)
Nag Arvind Gudiseva
 
MSC Temporary Passwords reset tool
Nag Arvind Gudiseva
 
Store Support Operations - Training on MSC Application
Nag Arvind Gudiseva
 
Store Support Operations - Training on MSC Application
Nag Arvind Gudiseva
 

Recently uploaded (20)

PPTX
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
PDF
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
PDF
List of all the AI prompt cheat codes.pdf
Avijit Kumar Roy
 
PPTX
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
PDF
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
PDF
MusicVideoProjectRubric Animation production music video.pdf
ALBERTIANCASUGA
 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
PDF
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
PPTX
ER_Model_Relationship_in_DBMS_Presentation.pptx
dharaadhvaryu1992
 
PDF
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PDF
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
PDF
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
PDF
WEF_Future_of_Global_Fintech_Second_Edition_2025.pdf
AproximacionAlFuturo
 
PDF
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PPT
deep dive data management sharepoint apps.ppt
novaprofk
 
PDF
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
PDF
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
PPTX
Climate Action.pptx action plan for climate
justfortalabat
 
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
List of all the AI prompt cheat codes.pdf
Avijit Kumar Roy
 
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
MusicVideoProjectRubric Animation production music video.pdf
ALBERTIANCASUGA
 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
ER_Model_Relationship_in_DBMS_Presentation.pptx
dharaadhvaryu1992
 
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
WEF_Future_of_Global_Fintech_Second_Edition_2025.pdf
AproximacionAlFuturo
 
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
deep dive data management sharepoint apps.ppt
novaprofk
 
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
Climate Action.pptx action plan for climate
justfortalabat
 

Apache Drill with Oracle, Hive and HBase

  • 1. Page1 APACHE DRILL WITH ORACLE, HIVE AND HBASE Prepared By: Nag Arvind Gudiseva PROBLEM STATEMENT Create a data pipelineby analysing data frommultipledata sources and persista JSON document. ARCHITECTURAL SOLUTION Use Apache Drill StoragePlugins to connect to RDBMS (MySQL, Oracle,etc.), NoSQL databases (MongoDB, Hive, HBase, etc.) and text documents (JSON, CSV, etc.). Analysethe data in the tables (or dynamic schema on the fly for text documents) and leverage Apache Drill API to combine data from different tables (or text documents) from different data sources on the fly. Apache Drill exposes a REST WebService, which can be consumed usingJava Jersey REST Clientprogram. One can call thePOST method and submitDrill Queries as a requestobject and receive the response in a JSON format, which can then be persisted on the Local File System. PICTORIAL ILLUSTRATION
  • 2. Page2 INSTALLATION STEPS ON UBUNTU 14.04 VM 1. Download Apache Drill usingwget command wget http://mirror.symnds.com/software/Apache/drill/drill-1.4.0/apache-drill-1.4.0.tar.gz 2. Untar and extract tar -xvzf apache-drill-1.4.0.tar.gz 3. Move the folder to a preferred location sudo mv apache-drill-1.4.0 /usr/local/apache-drill 4. Install Zookeeper: a. Download the stableversion (zookeeper-3.4.6.tar.gz) from http://hadoop.apache.org/zookeeper/releases.html b. Untar and move the folder to a preferred location c. Rename zoo_sample.cfg as zoo.cfg STARTING DRILL a. EMBEDDED MODE (with SqlLine) <DRILL_HOME>/bin/sqlline -u jdbc:drill:zk=local (OR) ./bin/drill-embedded b. DISTRIBUTED MODE (Start ZooKeeper and Drill Bit) <ZOOKEEPER_HOME>/bin/zkServer.sh start < ZOOKEEPER _HOME>/bin/zkServer.sh status (AND) <DRILL_HOME>/bin/drillbit.sh start <DRILL_HOME>/bin/drillbit.sh status STOPPING DRILL (AND ZOOKEEPER) a. EMBEDDED MODE (with SqlLine) 0: jdbc:drill:zk=local>!quit
  • 3. Page3 b. DISTRIBUTED MODE (Stop Drill Bitand ZooKeeper) <DRILL_HOME>/bin/drillbit.sh stop <DRILL_HOME>/bin/drillbit.sh status (AND) < ZOOKEEPER _HOME>/bin/zkServer.sh stop < ZOOKEEPER _HOME>/bin/zkServer.sh status JAR DEFAULT QUERIES REFERENCE: <DRILL_HOME>/jars/3rdparty/foodmart-data-json-0.4.jar 0: jdbc:drill:zk=local>showdatabases; 0: jdbc:drill:zk=local>selectemployee_id, first_name,last_name,position_id,salary FROMcp.`employee.json` where salary > 30000; 0: jdbc:drill:zk=local>selectemployee_id, first_name,last_name,position_id,salary FROMcp.`employee.json` where salary > 30000 and position_id=2; 0: jdbc:drill:zk=local>selectemp.employee_id, emp.first_name,emp.salary,emp.department_id FROM cp.`employee.json` emp where emp.salary <40000 and emp.salary>21000; 0: jdbc:drill:zk=local>selectemp.employee_id, emp.first_name,emp.salary,emp.department_id,dept.department_description FROM cp.`employee.json` emp , cp.`department.json` dept where emp.salary <40000 and emp.salary>21000 and emp.department_id = dept.department_id; JSON SAMPLE QUERIES SELECT * from dfs.`/home/gudiseva/arvind/zips.json` LIMIT10; CSV SAMPLE QUERIES select * FROM dfs.`/home/gudiseva/arvind/sample.csv`; select columns[0] as id,columns[1] name, columns[2] as weight, columns[3] as height FROM dfs.`/home/gudiseva/arvind/sample.csv`; CREATING VIEW BY QUERYING MULTIPLE DATA SOURCES CREATE or REPLACE view dfs.tmp.MULTI_VIEW as selectemp.employee_id, phy.columns[1] as Name ,dept.department_description,phy.columns[2] as Weight, phy.columns[3] as Height FROM cp.`employee.json` emp , cp.`department.json` dept, dfs.`/home/gudiseva/arvind/sample.csv` phy where CAST(emp.employee_id AS INT) = CAST(phy.columns[0] AS INT) and emp.department_id = dept.department_id; SELECT * FROM dfs.tmp.MULTI_VIEW;
  • 4. Page4 OPTIMIZATION CONFIGURATIONS <DRILL_HOME>/conf/drill-env.sh DRILL_MAX_DIRECT_MEMORY="1G" DRILL_HEAP="512M" STORAGE PLUGINS MYSQL { "type": "jdbc", "driver": "com.mysql.jdbc.Driver", "url": "jdbc:mysql://localhost", "username": "root", "password":"root", "enabled": true } select * from mysql.userdb.`employee`; ORACLE { "type": "jdbc", "driver": "oracle.jdbc.OracleDriver", "url": "jdbc:oracle:thin:MY_APPL/[email protected]:1523/N2S004I", "username": "MY_APPL", "password":"mayura_123", "enabled": true } select * from oracle.MY_APPL.`emp`; HIVE { "type": "hive", "enabled": false, "configProps":{ "hive.metastore.uris": "thrift://localhost:10000", "javax.jdo.option.ConnectionURL": "jdbc:derby://localhost:1527/metastore_db;create=true", "hive.metastore.warehouse.dir": "/user/hive/warehouse",
  • 5. Page5 "fs.default.name": "file:///", "hive.metastore.sasl.enabled": "false" } } select * from hive.arvind.`employee`; NOTE: HIVE SERVER should be started $ hive --servicehiveserver --verbose [hive shell will not work when Hive Server is started] MONGODB { "type": "mongo", "connection": "mongodb://first_name:[email protected]:48537/m101", "enabled": true } select `_id`, `value` from mongo.m101.`storm`; HBASE { "type": "hbase", "config": { "hbase.zookeeper.quorum": "localhost", "hbase.zookeeper.property.clientPort": "2181" }, "size.calculator.enabled":false, "enabled": true } SELECT CONVERT_FROM(row_key, 'UTF8') AS empid, CONVERT_FROM(emp.personal_data.city, 'UTF8') AS city, CONVERT_FROM(emp.personal_data.name, 'UTF8') AS name, CONVERT_FROM(emp.professional_data.designation, 'UTF8') AS designation, CONVERT_FROM(emp.professional_data.salary,'UTF8') AS salary FROM hbase.`emp`;
  • 6. Page6 RELOAD .BASHRC: source~/.bashrc (OR) . ~/.bashrc HBASE SAMPLE QUERIES select * from hbase.`emp`; SELECT CONVERT_FROM(row_key, 'UTF8') AS empid FROM hbase.`emp`; SELECT CONVERT_FROM(row_key, 'UTF8') AS empid, CONVERT_FROM(emp.personal_data.city, 'UTF8') AS city FROM hbase.`emp`; SELECT CONVERT_FROM(emp.personal_data.city, 'UTF8') AS city, CONVERT_FROM(emp.personal_data.name, 'UTF8') As name FROM hbase.`emp`; SELECT CONVERT_FROM(row_key, 'UTF8') AS empid, CONVERT_FROM(emp.p ersonal_data.city, 'UTF8') AS city, CONVERT_FROM(emp.personal_data.name, 'UTF8') As name, CONVERT_FROM(emp.professional_data.designation, 'UTF8') AS designation,CONVERT_FROM(emp.professional_data.salary,'UTF8') As salary FROMhbase.`emp`; ORACLE, HIVE AND HBASE (UNION ALL) QUERIES select id,name, salary frommysql.userdb.`employee` union all selectid,first,salary fromoracle.MY_APPL.`emp`; SELECT EID AS ID, NAME AS NAME, SALARY AS SALARY FROM hive.arvind.`employee` WHERE DESTINATION LIKE '%manager%' UNION ALL SELECT CONVERT_FROM(row_key, 'UTF8') AS ID, CONVERT_FROM(emp.personal_data.name, 'UTF8') AS NAME, CONVERT_TO(emp.professional_data.salary,'UTF8') AS SALARY FROM hbase.`emp` WHERE CONVERT_FROM(emp.professional_data.designation, 'UTF8') LIKE '%manager%'; SELECT EID AS ID, NAME AS NAME, TO_NUMBER(SALARY, '######') AS SALARY FROM hive.arvind.`employee` WHERE DESTINATION LIKE '%manager%' UNION ALL SELECT ID AS ID, FIRST AS NAME, SALARY AS SALARY FROM oracle.MY_APPL.`emp` UNION ALL SELECT CONVERT_FROM(row_key, 'UTF8') AS ID, CONVERT_FROM(emp.personal_data.name, 'UTF8') AS NAME, TO_NUMBER(emp.professional_data.salary,'######') AS SALARY FROM hbase.`emp` WHERE CONVERT_FROM(emp.professional_data.designation, 'UTF8') LIKE '%manager%';
  • 7. Page7 DRILL REST WEBSERVICE INTERFACE 1. Install RESTClientExtension in Firefox 2. On Firefox Browser, open the REST Client: chrome://restclient/content/restclient.html 3. Set the Request object: Method: POST URL: http://localhost:8047/query.json Header: Content-Type: application/json Body: { "queryType" : "SQL", "query": "SELECT * from dfs.`/home/gudiseva/arvind/zips.json` LIMIT10" } 4. Response object received: Response Headers: Status Code: 200 OK Content-Length: 1377 Content-Type: application/json Server: Jetty(9.1.5.v20140505) Response Body: { "columns": [ "_id", "city", "loc", "pop", "state" ], "rows" : [ { "_id" : "01001", "state" : "MA", "loc" : "[-72.622739,42.070206]", "pop" : "15338", "city" : "AGAWAM" }, { "_id" : "01002", "state" : "MA", "loc" : "[-72.51565,42.377017]", "pop" : "36963", "city" : "CUSHMAN" } ] }
  • 8. Page8 FIGURE ILLUSTRATION OF REST UI CLIENT JAVA SAMPLE PROGRAMS DRILL JDBC API (WITH DRILL JDBC DRIVER) package nag.arvind.gudiseva; import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.SQLException; import java.sql.Statement; public class DrillHiveOracleHBase { public staticvoid main(String[]args) { try {
  • 9. Page9 Class.forName("org.apache.drill.jdbc.Driver"); Connection conn=DriverManager.getConnection("jdbc:drill:zk=localhost:2181","",""); Statementstmt =conn.createStatement(); String sql ="SELECTEID AS ID, NAMEAS NAME, TO_NUMBER(SALARY, '######') AS SALARY FROMhive.arvind.`employee` WHERE DESTINATION LIKE'%manager%'"+ "UNION ALL "+ "SELECTID AS ID, FIRSTAS NAME, SALARY AS SALARY FROM oracle.MY_APPL.`emp` "+ "UNION ALL "+ "SELECTCONVERT_FROM(row_key, 'UTF8') AS ID, CONVERT_FROM(emp.personal_data.name, 'UTF8') AS NAME, TO_NUMBER(emp.professional_data.salary, '######') AS SALARY FROM hbase.`emp` WHERE CONVERT_FROM(emp.professional_data.designation,'UTF8') LIKE'%manager%'"; ResultSet rs =stmt.executeQuery(sql); System.out.println("ID"+"t"+ "NAME " +"t" + "SALARY "); System.out.println("__"+"t" + "____ " + "t"+ "______ "); System.out.println("--"+"t" + "---- " +"t" + "------ "); while(rs.next()) { String id =rs.getString("ID"); String name=rs.getString("NAME"); int salary =rs.getInt("SALARY"); System.out.println(id+"t"+name +"t"+salary); } rs.close(); stmt.close(); conn.close(); } catch (ClassNotFoundExceptione) { e.printStackTrace(); } catch (SQLException e) { e.printStackTrace(); } } } DRILL JDBC JARS: drill-jdbc-all-1.4.0.jar DRILL REST WEBSERVICE (WITH JERSEY REST CLIENT) package web.service.rest; import java.io.FileWriter; import java.io.IOException;
  • 10. Page10 import com.sun.jersey.api.client.Client; import com.sun.jersey.api.client.ClientResponse; import com.sun.jersey.api.client.WebResource; import com.sun.jersey.api.client.WebResource.Builder; public class JerseyClientPost{ public staticvoid main(String[]args) { try { Client client= Client.create(); WebResourcewebResource=client.resource("http://localhost:8047/query.json"); String input ="{" + ""queryType": "SQL"" + "," + ""query": "SELECT* from dfs.`/home/gudiseva/arvind/zips.json` LIMIT10"" + "}"; Builder builder =webResource.type("application/json"); ClientResponse response =builder.post(ClientResponse.class, input); if(response.getStatus() !=200) { throw new RuntimeException("Failed: HTTP error code: "+response.getStatus()); } System.out.println("Outputfrom Server .... n"); String output=response.getEntity(String.class); System.out.println(output); try { FileWriterfile=new FileWriter("/tmp/JSON/aix_input.json"); file.write(output); file.flush(); file.close(); } catch (IOException e) { e.printStackTrace(); } } catch (Exception e) { e.printStackTrace(); } }