SlideShare a Scribd company logo
V.SAKTHIPRIYA
II-MSC (IT)
NADAR SARASWATHI COLLEGE OF ARTS AND
SCIENCE
APACHE HIVE
 Apache Hive is a data warehouse system built on top of
Hadoop and is used for analyzing structured and semi-
structured data. Hive abstracts the complexity of
Hadoop MapReduce. Basically, it provides a
mechanism to project structure onto the data and
perform queries written in HQL (Hive Query
Language) that are similar to SQL statements.
Internally, these queries or HQL gets converted to map
reduce jobs by the Hive compiler.
INTRODUCTION:
 Hive QL is the HIVE QUERY LANGUAGE
 Hive offers no support for row-level inserts, updates, and
deletes.
 Hive does not support transactions.
 Hive adds extensions to provide better performance in the
context of Hadoop and to integrate with custom
extensions and even external programs.
CON..
 DDL and DML are the parts of HIVE QL
 Data Definition Language (DDL) is used for creating,
altering and dropping databases, tables, views, functions
and indexes.
 Data manipulation language is used to put data into Hive
tables and to extract data to the file system and also how
to explore and manipulate data with queries, grouping,
filtering, joining etc.
DATABASES IN HIVE:
 The Databases in the Hive is essentially just a catalog or
namespace of tables.
 They are very useful for larger clusters with multiple
teams and users, as a way of avoiding table name
CON
 Hive provides commands such as
 CREATE DATABASE db name -- to create a database in
Hive
 USE db name -- To use the database in Hive.
 DROP db name -- To delete the database in Hive.
 SHOW DATABASE -- to see the list of the DataBase
TABLES IN HIVE:
 Hive table is logically made up of the data being stored and the
associated metadata describing the layout of the data in the
table.
 The data typically resides in HDFS, although it may reside on
any Hadoop file system including the local file system.
 Hive stores the metadata in a relational database and not
in HDFS.
 The command for creating a table in Hive is
 have>CREATE TABLE EMP (empid int, ename string, esal
double)
 To have, we are having two types of tables
 Managed tables
 External tables
Managed tables
External table
MANAGED TABLES
 Managed tables are the one which will be managed in the
Hive warehouse i.e. whenever we create a managed table
definition, it will be stored under the default location of
the Hive warehouse i.e./user/Hive/warehouse.
 When we drop a managed table, Hive deletes the data in
the table
 Managed tables are less convenient for sharing with other
tools.
SYNTAX FOR CREATING HIVE
MANAGED TABLE:-
 Hive>create table manage- tab (empid, ename string, esal int)
row format delimited fields terminated by ‘t’ lines terminated
by ‘m’ stored as a text file;
 How to load the data into managed tables
 We can load the data in two ways
 Local Mode
 HDFS Mode
 In local mode, the syntax is
hive>load data local in path’/home/new Batch/input1.txt’
Into table managed-tab;For
 HDFS mode, the syntax is
hive>load data in path’/user/ramesh/Hive/input2.txt’ Into
table managed – tab;
EXTERNAL TABLES:
 Along with the managed tables, Hive also uses external
tables.
 Whenever the key word ‘external’ comes in the table
definition part. A hive will not bother about the table
definition, i.e. the external table will not be managed by
the Hive warehouse system.
 Along with the external keyword, we can also mention the
‘location’ in the table definition, where exactly the table
definition will get stored.
SYNTAX:-
 Hive>create external table external- tab(empid int, ename
string, esal double) row format delimited fields
Terminated by ‘f’ lines terminated by ‘n’ stored as text file
location ‘userRameshHive-external’;A location will be
automatically created.
 Loading data into External Tables:-
 Loading data from HDFS to
 Hive>load data in path’/Ramesh/input data.txt’ into table
external-tab;
ALTERING TABLE:
 Most table properties can be altered with the ALTER
TABLE statement, which change metadata about the table
but not the table itself.
 ALTER TABLE modifies table meta data on.
 Then statements can be used to fix mistakes in schema,
move partition locations and do other operations
RENAMING A TABLE:
 This statement is used to rename the table Log_messages
to log msgs
 Cmd:
ALTER TABLE log _ messages RENAME To logmsgs;
CHANGING COLUMNS
 You can rename a column, change its position, type or
comment.
 Syntax:
 ALTER TABLE log-messages CHANCE COLUMN hms
hours-minutes-
 Seconds INT COMMENT ’The hours, minutes and
seconds are part of the times tamp’AFTER Severity;
 You have to specify the old name, a new name and the
type even if the name or type is not changed.
ADDING COLUMNS
 You can add new columns to the end of the existing
columns, before any partition.
 Example: ALTER TABLE Log-message ADD
COLUMNS(app-name String COMMENT” Application
Name” ,session-id long);
DELETING OR REPLACING COLUMNS:
 The replace statement can only be used with tables that
use one of the native ser De modules are Dynamic Ser De
or Metadata Type column set ser De.
 Ser De determines how records are parsed into columns i.e
deserialization and how records columns are stored
(serialization)
PARTITIONS:
 A table may be partitioned in multiple dimensions.
 Partitioned are defined at table creation time using the
PATITIONED by the clause, which takes a list of column
definitions.
 If we want to search a large amount of data, then we can divide
the large data into partitions.
BUCKETS:
 There are two reasons why you might want to
organize your tables (or partitions) into buckets.
 The first is to enable more efficient queries.
 The second reason to bucket a table is to make
sampling more efficient.

More Related Content

PPTX
Unit 5-lecture-3
vishal choudhary
 
PPTX
Advanced topics in hive
Uday Vakalapudi
 
PPTX
Apache h base
Ramakrishna kapa
 
ODT
ACADGILD:: HADOOP LESSON
Padma shree. T
 
PPT
03 browsing the filesystem
Shay Cohen
 
PPTX
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Yahoo Developer Network
 
PPT
Extracts from AS/400 Concepts & Tools workshop
Ramesh Joshi
 
PPT
Session 19 - MapReduce
AnandMHadoop
 
Unit 5-lecture-3
vishal choudhary
 
Advanced topics in hive
Uday Vakalapudi
 
Apache h base
Ramakrishna kapa
 
ACADGILD:: HADOOP LESSON
Padma shree. T
 
03 browsing the filesystem
Shay Cohen
 
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Yahoo Developer Network
 
Extracts from AS/400 Concepts & Tools workshop
Ramesh Joshi
 
Session 19 - MapReduce
AnandMHadoop
 

What's hot (20)

PPTX
Sql server lesson3
Ala Qunaibi
 
PDF
Import Database Data using RODBC in R Studio
Rupak Roy
 
PPTX
Rhbase
c raja
 
PDF
Hadoop first mr job - inverted index construction
Subhas Kumar Ghosh
 
PDF
Hadoop file
HR Krutika Meheta
 
PDF
Apache Hbase Architecture
Rupak Roy
 
PDF
Files
Hellen Gakuruh
 
DOCX
Oracle sql loader utility
nageswarareddapps
 
PDF
New in Hadoop: You should know the Various File Format in Hadoop.
veeracynixit
 
DOCX
Sql loader good example
Aneel Swarna MBA ,PMP
 
PPT
Hadoop file
HR Krutika Meheta
 
PPTX
INTRO TO SQL
Bro Shola Ajayi
 
PDF
Configuring and manipulating HDFS files
Rupak Roy
 
PPTX
Hive commands
Ganesh Sanap
 
PPT
Hechsp 001 Chapter 3
Brian Kelly
 
PPTX
MS Sql Server: Introduction To Database Concepts
DataminingTools Inc
 
PDF
Inside Parquet Format
Yue Chen
 
PPTX
Moving Data to and From R
Syracuse University
 
PDF
Introduction to R and R Studio
Rupak Roy
 
PPTX
Big file tablespaces
dev3993
 
Sql server lesson3
Ala Qunaibi
 
Import Database Data using RODBC in R Studio
Rupak Roy
 
Rhbase
c raja
 
Hadoop first mr job - inverted index construction
Subhas Kumar Ghosh
 
Hadoop file
HR Krutika Meheta
 
Apache Hbase Architecture
Rupak Roy
 
Oracle sql loader utility
nageswarareddapps
 
New in Hadoop: You should know the Various File Format in Hadoop.
veeracynixit
 
Sql loader good example
Aneel Swarna MBA ,PMP
 
Hadoop file
HR Krutika Meheta
 
INTRO TO SQL
Bro Shola Ajayi
 
Configuring and manipulating HDFS files
Rupak Roy
 
Hive commands
Ganesh Sanap
 
Hechsp 001 Chapter 3
Brian Kelly
 
MS Sql Server: Introduction To Database Concepts
DataminingTools Inc
 
Inside Parquet Format
Yue Chen
 
Moving Data to and From R
Syracuse University
 
Introduction to R and R Studio
Rupak Roy
 
Big file tablespaces
dev3993
 
Ad

Similar to Hive (20)

PPTX
Big Data Analytics (BAD601) Module-4.pptx
AmbikaVenkatesh4
 
ODT
ACADGILD:: HADOOP LESSON
Padma shree. T
 
PPTX
443988696-Chapter-9-HIVEHIVEHIVE-pptx.pptx
AbdellahELMAMOUN
 
PPTX
Apache Hive
tusharsinghal58
 
PPTX
Hive It stores schema in a database and processed data into HDFS. It provides...
rajsigh020
 
PPTX
Ten tools for ten big data areas 04_Apache Hive
Will Du
 
PDF
Apache Hive, data segmentation and bucketing
earnwithme2522
 
PDF
Hbase
Vetri V
 
PPTX
Hbase
AmitkumarPal21
 
PDF
Hive Demo Paper at VLDB 2009
Namit Jain
 
PPTX
03 hive query language (hql)
Subhas Kumar Ghosh
 
PPTX
HBase.pptx
Sadhik7
 
PPTX
Hive Hadoop
Farafekr Technology Ltd.
 
PPTX
Hive presentation
Hitesh Agrawal
 
PPTX
Unit 5-apache hive
vishal choudhary
 
PPTX
Apache Hive and commands PPT Presentation
Dhanush947555
 
PPT
HBASE Overview
Sampath Rachakonda
 
PPTX
Unit II Hadoop Ecosystem_Updated.pptx
BhavanaHotchandani
 
PPTX
6.hive
Prashant Gupta
 
PPT
HBase and Hive at StumbleUpon Presentation.ppt
zaynablboudaoudi
 
Big Data Analytics (BAD601) Module-4.pptx
AmbikaVenkatesh4
 
ACADGILD:: HADOOP LESSON
Padma shree. T
 
443988696-Chapter-9-HIVEHIVEHIVE-pptx.pptx
AbdellahELMAMOUN
 
Apache Hive
tusharsinghal58
 
Hive It stores schema in a database and processed data into HDFS. It provides...
rajsigh020
 
Ten tools for ten big data areas 04_Apache Hive
Will Du
 
Apache Hive, data segmentation and bucketing
earnwithme2522
 
Hbase
Vetri V
 
Hive Demo Paper at VLDB 2009
Namit Jain
 
03 hive query language (hql)
Subhas Kumar Ghosh
 
HBase.pptx
Sadhik7
 
Hive presentation
Hitesh Agrawal
 
Unit 5-apache hive
vishal choudhary
 
Apache Hive and commands PPT Presentation
Dhanush947555
 
HBASE Overview
Sampath Rachakonda
 
Unit II Hadoop Ecosystem_Updated.pptx
BhavanaHotchandani
 
HBase and Hive at StumbleUpon Presentation.ppt
zaynablboudaoudi
 
Ad

More from GowriLatha1 (20)

PPTX
Filtering in frequency domain
GowriLatha1
 
PPTX
Demand assigned and packet reservation multiple access
GowriLatha1
 
PPTX
Software engineering
GowriLatha1
 
PPTX
Shadow paging
GowriLatha1
 
PPTX
Multithreading
GowriLatha1
 
PPTX
Web services & com+ components
GowriLatha1
 
PPTX
Comparison with Traditional databases
GowriLatha1
 
PPTX
Recovery system
GowriLatha1
 
PPTX
Comparison with Traditional databases
GowriLatha1
 
PPTX
Static analysis
GowriLatha1
 
PPTX
Hema dm
GowriLatha1
 
PPTX
Data reduction
GowriLatha1
 
PPTX
Inter process communication
GowriLatha1
 
PPTX
computer network
GowriLatha1
 
PPTX
Operating System
GowriLatha1
 
PPTX
Data mining query language
GowriLatha1
 
PPTX
Enterprice java
GowriLatha1
 
PPTX
Ethernet
GowriLatha1
 
PPTX
Java script
GowriLatha1
 
PPTX
Path & application(ds)2
GowriLatha1
 
Filtering in frequency domain
GowriLatha1
 
Demand assigned and packet reservation multiple access
GowriLatha1
 
Software engineering
GowriLatha1
 
Shadow paging
GowriLatha1
 
Multithreading
GowriLatha1
 
Web services & com+ components
GowriLatha1
 
Comparison with Traditional databases
GowriLatha1
 
Recovery system
GowriLatha1
 
Comparison with Traditional databases
GowriLatha1
 
Static analysis
GowriLatha1
 
Hema dm
GowriLatha1
 
Data reduction
GowriLatha1
 
Inter process communication
GowriLatha1
 
computer network
GowriLatha1
 
Operating System
GowriLatha1
 
Data mining query language
GowriLatha1
 
Enterprice java
GowriLatha1
 
Ethernet
GowriLatha1
 
Java script
GowriLatha1
 
Path & application(ds)2
GowriLatha1
 

Recently uploaded (20)

PPTX
How to Apply for a Job From Odoo 18 Website
Celine George
 
PPTX
BASICS IN COMPUTER APPLICATIONS - UNIT I
suganthim28
 
PDF
Antianginal agents, Definition, Classification, MOA.pdf
Prerana Jadhav
 
PDF
Review of Related Literature & Studies.pdf
Thelma Villaflores
 
DOCX
SAROCES Action-Plan FOR ARAL PROGRAM IN DEPED
Levenmartlacuna1
 
PPTX
How to Close Subscription in Odoo 18 - Odoo Slides
Celine George
 
PPTX
Information Texts_Infographic on Forgetting Curve.pptx
Tata Sevilla
 
PPTX
Introduction to pediatric nursing in 5th Sem..pptx
AneetaSharma15
 
PPTX
Sonnet 130_ My Mistress’ Eyes Are Nothing Like the Sun By William Shakespear...
DhatriParmar
 
PPTX
INTESTINALPARASITES OR WORM INFESTATIONS.pptx
PRADEEP ABOTHU
 
PPTX
Kanban Cards _ Mass Action in Odoo 18.2 - Odoo Slides
Celine George
 
PPTX
Basics and rules of probability with real-life uses
ravatkaran694
 
PPTX
CONCEPT OF CHILD CARE. pptx
AneetaSharma15
 
PDF
2.Reshaping-Indias-Political-Map.ppt/pdf/8th class social science Exploring S...
Sandeep Swamy
 
PDF
BÀI TẬP TEST BỔ TRỢ THEO TỪNG CHỦ ĐỀ CỦA TỪNG UNIT KÈM BÀI TẬP NGHE - TIẾNG A...
Nguyen Thanh Tu Collection
 
PPTX
Care of patients with elImination deviation.pptx
AneetaSharma15
 
DOCX
Modul Ajar Deep Learning Bahasa Inggris Kelas 11 Terbaru 2025
wahyurestu63
 
PDF
Module 2: Public Health History [Tutorial Slides]
JonathanHallett4
 
PPTX
Five Point Someone – Chetan Bhagat | Book Summary & Analysis by Bhupesh Kushwaha
Bhupesh Kushwaha
 
PPTX
Applications of matrices In Real Life_20250724_091307_0000.pptx
gehlotkrish03
 
How to Apply for a Job From Odoo 18 Website
Celine George
 
BASICS IN COMPUTER APPLICATIONS - UNIT I
suganthim28
 
Antianginal agents, Definition, Classification, MOA.pdf
Prerana Jadhav
 
Review of Related Literature & Studies.pdf
Thelma Villaflores
 
SAROCES Action-Plan FOR ARAL PROGRAM IN DEPED
Levenmartlacuna1
 
How to Close Subscription in Odoo 18 - Odoo Slides
Celine George
 
Information Texts_Infographic on Forgetting Curve.pptx
Tata Sevilla
 
Introduction to pediatric nursing in 5th Sem..pptx
AneetaSharma15
 
Sonnet 130_ My Mistress’ Eyes Are Nothing Like the Sun By William Shakespear...
DhatriParmar
 
INTESTINALPARASITES OR WORM INFESTATIONS.pptx
PRADEEP ABOTHU
 
Kanban Cards _ Mass Action in Odoo 18.2 - Odoo Slides
Celine George
 
Basics and rules of probability with real-life uses
ravatkaran694
 
CONCEPT OF CHILD CARE. pptx
AneetaSharma15
 
2.Reshaping-Indias-Political-Map.ppt/pdf/8th class social science Exploring S...
Sandeep Swamy
 
BÀI TẬP TEST BỔ TRỢ THEO TỪNG CHỦ ĐỀ CỦA TỪNG UNIT KÈM BÀI TẬP NGHE - TIẾNG A...
Nguyen Thanh Tu Collection
 
Care of patients with elImination deviation.pptx
AneetaSharma15
 
Modul Ajar Deep Learning Bahasa Inggris Kelas 11 Terbaru 2025
wahyurestu63
 
Module 2: Public Health History [Tutorial Slides]
JonathanHallett4
 
Five Point Someone – Chetan Bhagat | Book Summary & Analysis by Bhupesh Kushwaha
Bhupesh Kushwaha
 
Applications of matrices In Real Life_20250724_091307_0000.pptx
gehlotkrish03
 

Hive

  • 1. V.SAKTHIPRIYA II-MSC (IT) NADAR SARASWATHI COLLEGE OF ARTS AND SCIENCE
  • 2. APACHE HIVE  Apache Hive is a data warehouse system built on top of Hadoop and is used for analyzing structured and semi- structured data. Hive abstracts the complexity of Hadoop MapReduce. Basically, it provides a mechanism to project structure onto the data and perform queries written in HQL (Hive Query Language) that are similar to SQL statements. Internally, these queries or HQL gets converted to map reduce jobs by the Hive compiler.
  • 3. INTRODUCTION:  Hive QL is the HIVE QUERY LANGUAGE  Hive offers no support for row-level inserts, updates, and deletes.  Hive does not support transactions.  Hive adds extensions to provide better performance in the context of Hadoop and to integrate with custom extensions and even external programs.
  • 4. CON..  DDL and DML are the parts of HIVE QL  Data Definition Language (DDL) is used for creating, altering and dropping databases, tables, views, functions and indexes.  Data manipulation language is used to put data into Hive tables and to extract data to the file system and also how to explore and manipulate data with queries, grouping, filtering, joining etc.
  • 5. DATABASES IN HIVE:  The Databases in the Hive is essentially just a catalog or namespace of tables.  They are very useful for larger clusters with multiple teams and users, as a way of avoiding table name
  • 6. CON  Hive provides commands such as  CREATE DATABASE db name -- to create a database in Hive  USE db name -- To use the database in Hive.  DROP db name -- To delete the database in Hive.  SHOW DATABASE -- to see the list of the DataBase
  • 7. TABLES IN HIVE:  Hive table is logically made up of the data being stored and the associated metadata describing the layout of the data in the table.  The data typically resides in HDFS, although it may reside on any Hadoop file system including the local file system.  Hive stores the metadata in a relational database and not in HDFS.  The command for creating a table in Hive is  have>CREATE TABLE EMP (empid int, ename string, esal double)
  • 8.  To have, we are having two types of tables  Managed tables  External tables Managed tables External table
  • 9. MANAGED TABLES  Managed tables are the one which will be managed in the Hive warehouse i.e. whenever we create a managed table definition, it will be stored under the default location of the Hive warehouse i.e./user/Hive/warehouse.  When we drop a managed table, Hive deletes the data in the table  Managed tables are less convenient for sharing with other tools.
  • 10. SYNTAX FOR CREATING HIVE MANAGED TABLE:-  Hive>create table manage- tab (empid, ename string, esal int) row format delimited fields terminated by ‘t’ lines terminated by ‘m’ stored as a text file;  How to load the data into managed tables  We can load the data in two ways  Local Mode  HDFS Mode
  • 11.  In local mode, the syntax is hive>load data local in path’/home/new Batch/input1.txt’ Into table managed-tab;For  HDFS mode, the syntax is hive>load data in path’/user/ramesh/Hive/input2.txt’ Into table managed – tab;
  • 12. EXTERNAL TABLES:  Along with the managed tables, Hive also uses external tables.  Whenever the key word ‘external’ comes in the table definition part. A hive will not bother about the table definition, i.e. the external table will not be managed by the Hive warehouse system.  Along with the external keyword, we can also mention the ‘location’ in the table definition, where exactly the table definition will get stored.
  • 13. SYNTAX:-  Hive>create external table external- tab(empid int, ename string, esal double) row format delimited fields Terminated by ‘f’ lines terminated by ‘n’ stored as text file location ‘userRameshHive-external’;A location will be automatically created.  Loading data into External Tables:-  Loading data from HDFS to  Hive>load data in path’/Ramesh/input data.txt’ into table external-tab;
  • 14. ALTERING TABLE:  Most table properties can be altered with the ALTER TABLE statement, which change metadata about the table but not the table itself.  ALTER TABLE modifies table meta data on.  Then statements can be used to fix mistakes in schema, move partition locations and do other operations
  • 15. RENAMING A TABLE:  This statement is used to rename the table Log_messages to log msgs  Cmd: ALTER TABLE log _ messages RENAME To logmsgs;
  • 16. CHANGING COLUMNS  You can rename a column, change its position, type or comment.  Syntax:  ALTER TABLE log-messages CHANCE COLUMN hms hours-minutes-  Seconds INT COMMENT ’The hours, minutes and seconds are part of the times tamp’AFTER Severity;  You have to specify the old name, a new name and the type even if the name or type is not changed.
  • 17. ADDING COLUMNS  You can add new columns to the end of the existing columns, before any partition.  Example: ALTER TABLE Log-message ADD COLUMNS(app-name String COMMENT” Application Name” ,session-id long);
  • 18. DELETING OR REPLACING COLUMNS:  The replace statement can only be used with tables that use one of the native ser De modules are Dynamic Ser De or Metadata Type column set ser De.  Ser De determines how records are parsed into columns i.e deserialization and how records columns are stored (serialization)
  • 19. PARTITIONS:  A table may be partitioned in multiple dimensions.  Partitioned are defined at table creation time using the PATITIONED by the clause, which takes a list of column definitions.  If we want to search a large amount of data, then we can divide the large data into partitions.
  • 20. BUCKETS:  There are two reasons why you might want to organize your tables (or partitions) into buckets.  The first is to enable more efficient queries.  The second reason to bucket a table is to make sampling more efficient.