SlideShare a Scribd company logo
2
Most read
8
Most read
9
Most read
UNIT-4
Apache Pig
 Pig is a high-level programming language
useful for analyzing large data sets. Pig
was a result of development effort at
Yahoo!
 In a MapReduce framework, programs
need to be translated into a series of Map
and Reduce stages. However, this is not a
programming model which data analysts
are familiar with. So, in order to bridge this
gap, an abstraction called Pig was built on
top of Hadoop.
 Apache Pig enables people to focus more
on analyzing bulk data sets and to
spend less time writing Map-Reduce
programs. Similar to Pigs, who eat
anything, the Apache Pig programming
language is designed to work upon any
kind of data. That's why the name, Pig!
Pig Architecture
The Architecture of Pig consists of two components:
 Pig Latin, which is a language
 A runtime environment, for running PigLatin programs.
A Pig Latin program consists of a series of operations or
transformations which are applied to the input data to produce
output. These operations describe a data flow which is
translated into an executable representation, by Hadoop Pig
execution environment. Underneath, results of these
transformations are series of MapReduce jobs which a
programmer is unaware of. So, in a way, Pig in Hadoop allows
the programmer to focus on data rather than the nature of
execution.
PigLatin is a relatively stiffened language which uses familiar
keywords from data processing e.g., Join, Group and Filter.
Apache Pig Architecture in Hadoop
 Apache Pig architecture consists of a Pig Latin
interpreter that uses Pig Latin scripts to process and
analyze massive datasets. Programmers use Pig
Latin language to analyze large datasets in
the Hadoop environment. Apache pig has a rich set
of datasets for performing different data operations
like join, filter, sort, load, group, etc.
 Programmers must use Pig Latin language to write a
Pig script to perform a specific task. Pig converts
these Pig scripts into a series of Map-Reduce jobs to
ease programmers’ work. Pig Latin programs are
executed via various mechanisms such as UDFs,
embedded, and Grunt shells.

Apache Pig architecture is consisting of the following
major components:
 Parser
 Optimizer
 Compiler
 Execution Engine
 Execution Mode
Pig Latin Scripts
 Pig scripts are submitted to the Pig execution
environment to produce the desired results.
You can execute the Pig scripts by using one of
the methods:
 Grunt Shell
 Script file
 Embedded script
Parser
Parser handles all the Pig Latin statements or commands. Parser performs several checks on the Pig
statements like syntax check, type check, and generates a DAG (Directed Acyclic Graph) output.
DAG output represents all the logical operators of the scripts as nodes and data flow as edges.
Optimizer
Once parsing operation is completed and a DAG output is generated, the output is passed to the
optimizer. The optimizer then performs the optimization activities on the output, such as split, merge,
projection, pushdown, transform, and reorder, etc. The optimizer processes the extracted data and
omits unnecessary data or columns by performing pushdown and projection activity and improves
query performance.
Compiler
The compiler compiles the output that is generated by the optimizer into a series of Map Reduce jobs.
The compiler automatically converts Pig jobs into Map Reduce jobs and optimizes performance by
rearranging the execution order.
Execution Engine
After performing all the above operations, these Map Reduce jobs are submitted to the execution engine,
which is then executed on the Hadoop platform to produce the desired results. You can then use the
DUMP statement to display the results on screen or STORE statements to store the results
in HDFS (Hadoop Distributed File System).
Execution Mode
Apache Pig is executed in two execution modes that are local and Map Reduce. The choice of execution
mode depends on where the data is stored and where you want to run the Pig script. You can either
store your data locally (in a single machine) or in a distributed Hadoop cluster environment.
Local Mode – You can use local mode if your dataset is small. In local mode, Pig runs in a single JVM
using the local host and file system. In this mode, parallel mapper execution is impossible as all files
are installed and run on the localhost. You can use pig -x local command to specify the local mode.
Map Reduce Mode – Apache Pig uses the Map Reduce mode by default. In Map Reduce mode, a
programmer executes the Pig Latin statements on data that is already stored in the HDFS (Hadoop
Distributed File System). You can use pig -x mapreduce command to specify the Map-Reduce
mode.
Unit 4-apache pig
Apache Pig Components
Parser
Initially the Pig Scripts are handled by the Parser. It checks the syntax
of the script, does type checking, and other miscellaneous checks.
The output of the parser will be a DAG (directed acyclic graph),
which represents the Pig Latin statements and logical operators.
In the DAG, the logical operators of the script are represented as the
nodes and the data flows are represented as edges.
Optimizer
The logical plan (DAG) is passed to the logical optimizer, which carries
out the logical optimizations such as projection and pushdown.
Compiler
The compiler compiles the optimized logical plan into a series of
MapReduce jobs.
Execution engine
Finally the MapReduce jobs are submitted to Hadoop in a sorted order.
Finally, these MapReduce jobs are executed on Hadoop producing
the desired results.
Pig Latin Data Model
 The data model of Pig Latin is fully nested and it
allows complex non-atomic datatypes such
as map and tuple. Given below is the
diagrammatical representation of Pig Latin’s data
model.
 Atom
Any single value in Pig Latin, irrespective of their data, type is known as an Atom. It is stored as string
and can be used as string and number. int, long, float, double, chararray, and bytearray are the
atomic values of Pig. A piece of data or a simple atomic value is known as a field.
Example − ‘raja’ or ‘30’
 Tuple
A record that is formed by an ordered set of fields is known as a tuple, the fields can be of any type. A
tuple is similar to a row in a table of RDBMS.
Example − (Raja, 30)
 Bag
A bag is an unordered set of tuples. In other words, a collection of tuples (non-unique) is known as a
bag. Each tuple can have any number of fields (flexible schema). A bag is represented by ‘{}’. It is
similar to a table in RDBMS, but unlike a table in RDBMS, it is not necessary that every tuple
contain the same number of fields or that the fields in the same position (column) have the same
type.
Example − {(Raja, 30), (Mohammad, 45)}
A bag can be a field in a relation; in that context, it is known as inner bag.
Example − {Raja, 30, {9848022338, raja@gmail.com,}}
 Map
A map (or data map) is a set of key-value pairs. The key needs to be of type chararray and should be
unique. The value might be of any type. It is represented by ‘[]’
Example − [name#Raja, age#30]
 Relation
A relation is a bag of tuples. The relations in Pig Latin are unordered (there is no guarantee that tuples
are processed in any particular order).
Map Reduce vs. Apache Pig
Apache Pig Map Reduce
Scripting language Compiled language
Provides a higher level of
abstraction
Provides a low level of abstraction
Requires a few lines of code (10
lines of code can summarize 200
lines of Map Reduce code)
Requires a more extensive code
(more lines of code)
Requires less development time
and effort
Requires more development time
and effort
Lesser code efficiency
Higher efficiency of code in
comparison to Apache Pig
Apache Pig Features
 Allows programmers to write fewer lines of codes.
Programmers can write 200 lines of Java code in only
ten lines using the Pig Latin language.
 Apache Pig multi-query approach reduces the
development time.
 Apache pig has a rich set of datasets for performing
operations like join, filter, sort, load, group, etc.
 Pig Latin language is very similar to SQL.
Programmers with good SQL knowledge find it easy
to write Pig script.
 Allows programmers to write fewer lines of codes.
Programmers can write 200 lines of Java code in only
ten lines using the Pig Latin language.
 Apache Pig handles both structured and unstructured
data analysis.
Apache Pig Applications
 Processes large volume of data
 Supports quick prototyping and ad-hoc queries
across large datasets
 Performs data processing in search platforms
 Processes time-sensitive data loads
 Used by telecom companies to de-identify the
user call data information.
 https://www.geeksforgeeks.org/apache-pig-
installation-on-windows-and-case-study/

More Related Content

PPTX
Apache PIG
Prashant Gupta
 
PPTX
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...
Simplilearn
 
PDF
Big data unit i
Navjot Kaur
 
PPT
Hadoop HDFS.ppt
6535ANURAGANURAG
 
PPTX
Introduction to Pig
Prashanth Babu
 
PPTX
NoSQL Data Architecture Patterns
Maynooth University
 
PPTX
Hadoop
ABHIJEET RAJ
 
PDF
Introduction to Hadoop
Apache Apex
 
Apache PIG
Prashant Gupta
 
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...
Simplilearn
 
Big data unit i
Navjot Kaur
 
Hadoop HDFS.ppt
6535ANURAGANURAG
 
Introduction to Pig
Prashanth Babu
 
NoSQL Data Architecture Patterns
Maynooth University
 
Hadoop
ABHIJEET RAJ
 
Introduction to Hadoop
Apache Apex
 

What's hot (20)

PPSX
Hadoop
Nishant Gandhi
 
PPTX
Introduction to Hadoop
Dr. C.V. Suresh Babu
 
PPTX
Big Data Analytics with Hadoop
Philippe Julio
 
PPTX
Data cube computation
Rashmi Sheikh
 
PPT
Hive(ppt)
Abhinav Tyagi
 
PPTX
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn
 
PDF
Big Data: Its Characteristics And Architecture Capabilities
Ashraf Uddin
 
PPTX
Hadoop File system (HDFS)
Prashant Gupta
 
PPTX
Map Reduce
Prashant Gupta
 
PDF
Hadoop ecosystem
Stanley Wang
 
PDF
Hadoop Overview & Architecture
EMC
 
PPTX
HADOOP TECHNOLOGY ppt
sravya raju
 
PPTX
PPT on Hadoop
Shubham Parmar
 
PPTX
Introduction to Apache Spark
Rahul Jain
 
PPT
Inverted index
Krishna Gehlot
 
PDF
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab
 
PPTX
Chapter 1 big data
Prof .Pragati Khade
 
PPTX
Apache Spark overview
DataArt
 
PDF
Hadoop YARN
Vigen Sahakyan
 
PPTX
Introduction to Hadoop and Hadoop component
rebeccatho
 
Introduction to Hadoop
Dr. C.V. Suresh Babu
 
Big Data Analytics with Hadoop
Philippe Julio
 
Data cube computation
Rashmi Sheikh
 
Hive(ppt)
Abhinav Tyagi
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn
 
Big Data: Its Characteristics And Architecture Capabilities
Ashraf Uddin
 
Hadoop File system (HDFS)
Prashant Gupta
 
Map Reduce
Prashant Gupta
 
Hadoop ecosystem
Stanley Wang
 
Hadoop Overview & Architecture
EMC
 
HADOOP TECHNOLOGY ppt
sravya raju
 
PPT on Hadoop
Shubham Parmar
 
Introduction to Apache Spark
Rahul Jain
 
Inverted index
Krishna Gehlot
 
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab
 
Chapter 1 big data
Prof .Pragati Khade
 
Apache Spark overview
DataArt
 
Hadoop YARN
Vigen Sahakyan
 
Introduction to Hadoop and Hadoop component
rebeccatho
 
Ad

Similar to Unit 4-apache pig (20)

PPTX
Apache pig
Sadiq Basha
 
PPTX
A slide share pig in CCS334 for big data analytics
KrishnaVeni451953
 
PPTX
Introduction to pig.
Triloki Gupta
 
PPTX
Enhancing Big Data Analytics with Pig and Hadoop: Harnessing the Power of Dis...
ggphotosmuskan
 
PDF
Apache Pig: A big data processor
Tushar B Kute
 
PPTX
Unit 4 lecture2
vishal choudhary
 
PPTX
M4,C5 APACHE PIG.pptx
Shrinivasa6
 
PPTX
Pig power tools_by_viswanath_gangavaram
Viswanath Gangavaram
 
PDF
Big Data Hadoop Training
stratapps
 
PDF
43_Sameer_Kumar_Das2
Mr.Sameer Kumar Das
 
PPTX
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Viswanath Gangavaram
 
PPT
Hadoop - Apache Pig
Vibrant Technologies & Computers
 
PPTX
power point presentation on pig -hadoop framework
bhargavi804095
 
PPTX
Unit-5 [Pig] working and architecture.pptx
tripathineeharika
 
PDF
Unit V.pdf
KennyPratheepKumar
 
PPTX
Apache pig
FeniBharodiya
 
PDF
Introduction to PIG components
Rupak Roy
 
PDF
06 pig-01-intro
Aasim Naveed
 
PPTX
An Introduction to Apache Pig
Sachin Vakkund
 
PPTX
Understanding Pig and Hive in Apache Hadoop
mohindrachinmay
 
Apache pig
Sadiq Basha
 
A slide share pig in CCS334 for big data analytics
KrishnaVeni451953
 
Introduction to pig.
Triloki Gupta
 
Enhancing Big Data Analytics with Pig and Hadoop: Harnessing the Power of Dis...
ggphotosmuskan
 
Apache Pig: A big data processor
Tushar B Kute
 
Unit 4 lecture2
vishal choudhary
 
M4,C5 APACHE PIG.pptx
Shrinivasa6
 
Pig power tools_by_viswanath_gangavaram
Viswanath Gangavaram
 
Big Data Hadoop Training
stratapps
 
43_Sameer_Kumar_Das2
Mr.Sameer Kumar Das
 
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Viswanath Gangavaram
 
power point presentation on pig -hadoop framework
bhargavi804095
 
Unit-5 [Pig] working and architecture.pptx
tripathineeharika
 
Unit V.pdf
KennyPratheepKumar
 
Apache pig
FeniBharodiya
 
Introduction to PIG components
Rupak Roy
 
06 pig-01-intro
Aasim Naveed
 
An Introduction to Apache Pig
Sachin Vakkund
 
Understanding Pig and Hive in Apache Hadoop
mohindrachinmay
 
Ad

More from vishal choudhary (20)

PPTX
mobile application using automatin using node ja java on
vishal choudhary
 
PPTX
mobile development using node js and java
vishal choudhary
 
PPTX
Pixel to Percentage conversion Convert left and right padding of a div to per...
vishal choudhary
 
PPTX
esponsive web design means that your website (
vishal choudhary
 
PPTX
function in php using like three type of function
vishal choudhary
 
PPTX
data base connectivity in php using msql database
vishal choudhary
 
PPTX
software evelopment life cycle model and example of water fall model
vishal choudhary
 
PPTX
software Engineering lecture on development life cycle
vishal choudhary
 
PPTX
strings in php how to use different data types in string
vishal choudhary
 
PPTX
OPEN SOURCE WEB APPLICATION DEVELOPMENT question
vishal choudhary
 
PPTX
web performnace optimization using css minification
vishal choudhary
 
PPTX
web performance optimization using style
vishal choudhary
 
PPTX
Data types and variables in php for writing and databse
vishal choudhary
 
PPTX
Data types and variables in php for writing
vishal choudhary
 
PPTX
Data types and variables in php for writing
vishal choudhary
 
PPTX
sofwtare standard for test plan it execution
vishal choudhary
 
PPTX
Software test policy and test plan in development
vishal choudhary
 
PPTX
function in php like control loop and its uses
vishal choudhary
 
PPTX
introduction to php and its uses in daily
vishal choudhary
 
PPTX
data type in php and its introduction to use
vishal choudhary
 
mobile application using automatin using node ja java on
vishal choudhary
 
mobile development using node js and java
vishal choudhary
 
Pixel to Percentage conversion Convert left and right padding of a div to per...
vishal choudhary
 
esponsive web design means that your website (
vishal choudhary
 
function in php using like three type of function
vishal choudhary
 
data base connectivity in php using msql database
vishal choudhary
 
software evelopment life cycle model and example of water fall model
vishal choudhary
 
software Engineering lecture on development life cycle
vishal choudhary
 
strings in php how to use different data types in string
vishal choudhary
 
OPEN SOURCE WEB APPLICATION DEVELOPMENT question
vishal choudhary
 
web performnace optimization using css minification
vishal choudhary
 
web performance optimization using style
vishal choudhary
 
Data types and variables in php for writing and databse
vishal choudhary
 
Data types and variables in php for writing
vishal choudhary
 
Data types and variables in php for writing
vishal choudhary
 
sofwtare standard for test plan it execution
vishal choudhary
 
Software test policy and test plan in development
vishal choudhary
 
function in php like control loop and its uses
vishal choudhary
 
introduction to php and its uses in daily
vishal choudhary
 
data type in php and its introduction to use
vishal choudhary
 

Recently uploaded (20)

PDF
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
PPTX
short term internship project on Data visualization
JMJCollegeComputerde
 
PPTX
Introduction to computer chapter one 2017.pptx
mensunmarley
 
PPTX
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
PPTX
INFO8116 -Big data architecture and analytics
guddipatel10
 
PPTX
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
PPT
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
PPTX
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
PPTX
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
PPTX
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
PPTX
Probability systematic sampling methods.pptx
PrakashRajput19
 
PDF
blockchain123456789012345678901234567890
tanvikhunt1003
 
PPTX
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
PPTX
Fluvial_Civilizations_Presentation (1).pptx
alisslovemendoza7
 
PPTX
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
PPTX
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
PPTX
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 
PDF
Technical Writing Module-I Complete Notes.pdf
VedprakashArya13
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PPTX
M1-T1.pptxM1-T1.pptxM1-T1.pptxM1-T1.pptx
teodoroferiarevanojr
 
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
short term internship project on Data visualization
JMJCollegeComputerde
 
Introduction to computer chapter one 2017.pptx
mensunmarley
 
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
INFO8116 -Big data architecture and analytics
guddipatel10
 
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
Probability systematic sampling methods.pptx
PrakashRajput19
 
blockchain123456789012345678901234567890
tanvikhunt1003
 
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
Fluvial_Civilizations_Presentation (1).pptx
alisslovemendoza7
 
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 
Technical Writing Module-I Complete Notes.pdf
VedprakashArya13
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
M1-T1.pptxM1-T1.pptxM1-T1.pptxM1-T1.pptx
teodoroferiarevanojr
 

Unit 4-apache pig

  • 2.  Pig is a high-level programming language useful for analyzing large data sets. Pig was a result of development effort at Yahoo!  In a MapReduce framework, programs need to be translated into a series of Map and Reduce stages. However, this is not a programming model which data analysts are familiar with. So, in order to bridge this gap, an abstraction called Pig was built on top of Hadoop.
  • 3.  Apache Pig enables people to focus more on analyzing bulk data sets and to spend less time writing Map-Reduce programs. Similar to Pigs, who eat anything, the Apache Pig programming language is designed to work upon any kind of data. That's why the name, Pig!
  • 4. Pig Architecture The Architecture of Pig consists of two components:  Pig Latin, which is a language  A runtime environment, for running PigLatin programs. A Pig Latin program consists of a series of operations or transformations which are applied to the input data to produce output. These operations describe a data flow which is translated into an executable representation, by Hadoop Pig execution environment. Underneath, results of these transformations are series of MapReduce jobs which a programmer is unaware of. So, in a way, Pig in Hadoop allows the programmer to focus on data rather than the nature of execution. PigLatin is a relatively stiffened language which uses familiar keywords from data processing e.g., Join, Group and Filter.
  • 5. Apache Pig Architecture in Hadoop  Apache Pig architecture consists of a Pig Latin interpreter that uses Pig Latin scripts to process and analyze massive datasets. Programmers use Pig Latin language to analyze large datasets in the Hadoop environment. Apache pig has a rich set of datasets for performing different data operations like join, filter, sort, load, group, etc.  Programmers must use Pig Latin language to write a Pig script to perform a specific task. Pig converts these Pig scripts into a series of Map-Reduce jobs to ease programmers’ work. Pig Latin programs are executed via various mechanisms such as UDFs, embedded, and Grunt shells. 
  • 6. Apache Pig architecture is consisting of the following major components:  Parser  Optimizer  Compiler  Execution Engine  Execution Mode
  • 7. Pig Latin Scripts  Pig scripts are submitted to the Pig execution environment to produce the desired results. You can execute the Pig scripts by using one of the methods:  Grunt Shell  Script file  Embedded script
  • 8. Parser Parser handles all the Pig Latin statements or commands. Parser performs several checks on the Pig statements like syntax check, type check, and generates a DAG (Directed Acyclic Graph) output. DAG output represents all the logical operators of the scripts as nodes and data flow as edges. Optimizer Once parsing operation is completed and a DAG output is generated, the output is passed to the optimizer. The optimizer then performs the optimization activities on the output, such as split, merge, projection, pushdown, transform, and reorder, etc. The optimizer processes the extracted data and omits unnecessary data or columns by performing pushdown and projection activity and improves query performance. Compiler The compiler compiles the output that is generated by the optimizer into a series of Map Reduce jobs. The compiler automatically converts Pig jobs into Map Reduce jobs and optimizes performance by rearranging the execution order. Execution Engine After performing all the above operations, these Map Reduce jobs are submitted to the execution engine, which is then executed on the Hadoop platform to produce the desired results. You can then use the DUMP statement to display the results on screen or STORE statements to store the results in HDFS (Hadoop Distributed File System). Execution Mode Apache Pig is executed in two execution modes that are local and Map Reduce. The choice of execution mode depends on where the data is stored and where you want to run the Pig script. You can either store your data locally (in a single machine) or in a distributed Hadoop cluster environment. Local Mode – You can use local mode if your dataset is small. In local mode, Pig runs in a single JVM using the local host and file system. In this mode, parallel mapper execution is impossible as all files are installed and run on the localhost. You can use pig -x local command to specify the local mode. Map Reduce Mode – Apache Pig uses the Map Reduce mode by default. In Map Reduce mode, a programmer executes the Pig Latin statements on data that is already stored in the HDFS (Hadoop Distributed File System). You can use pig -x mapreduce command to specify the Map-Reduce mode.
  • 10. Apache Pig Components Parser Initially the Pig Scripts are handled by the Parser. It checks the syntax of the script, does type checking, and other miscellaneous checks. The output of the parser will be a DAG (directed acyclic graph), which represents the Pig Latin statements and logical operators. In the DAG, the logical operators of the script are represented as the nodes and the data flows are represented as edges. Optimizer The logical plan (DAG) is passed to the logical optimizer, which carries out the logical optimizations such as projection and pushdown. Compiler The compiler compiles the optimized logical plan into a series of MapReduce jobs. Execution engine Finally the MapReduce jobs are submitted to Hadoop in a sorted order. Finally, these MapReduce jobs are executed on Hadoop producing the desired results.
  • 11. Pig Latin Data Model  The data model of Pig Latin is fully nested and it allows complex non-atomic datatypes such as map and tuple. Given below is the diagrammatical representation of Pig Latin’s data model.
  • 12.  Atom Any single value in Pig Latin, irrespective of their data, type is known as an Atom. It is stored as string and can be used as string and number. int, long, float, double, chararray, and bytearray are the atomic values of Pig. A piece of data or a simple atomic value is known as a field. Example − ‘raja’ or ‘30’  Tuple A record that is formed by an ordered set of fields is known as a tuple, the fields can be of any type. A tuple is similar to a row in a table of RDBMS. Example − (Raja, 30)  Bag A bag is an unordered set of tuples. In other words, a collection of tuples (non-unique) is known as a bag. Each tuple can have any number of fields (flexible schema). A bag is represented by ‘{}’. It is similar to a table in RDBMS, but unlike a table in RDBMS, it is not necessary that every tuple contain the same number of fields or that the fields in the same position (column) have the same type. Example − {(Raja, 30), (Mohammad, 45)} A bag can be a field in a relation; in that context, it is known as inner bag. Example − {Raja, 30, {9848022338, [email protected],}}  Map A map (or data map) is a set of key-value pairs. The key needs to be of type chararray and should be unique. The value might be of any type. It is represented by ‘[]’ Example − [name#Raja, age#30]  Relation A relation is a bag of tuples. The relations in Pig Latin are unordered (there is no guarantee that tuples are processed in any particular order).
  • 13. Map Reduce vs. Apache Pig Apache Pig Map Reduce Scripting language Compiled language Provides a higher level of abstraction Provides a low level of abstraction Requires a few lines of code (10 lines of code can summarize 200 lines of Map Reduce code) Requires a more extensive code (more lines of code) Requires less development time and effort Requires more development time and effort Lesser code efficiency Higher efficiency of code in comparison to Apache Pig
  • 14. Apache Pig Features  Allows programmers to write fewer lines of codes. Programmers can write 200 lines of Java code in only ten lines using the Pig Latin language.  Apache Pig multi-query approach reduces the development time.  Apache pig has a rich set of datasets for performing operations like join, filter, sort, load, group, etc.  Pig Latin language is very similar to SQL. Programmers with good SQL knowledge find it easy to write Pig script.  Allows programmers to write fewer lines of codes. Programmers can write 200 lines of Java code in only ten lines using the Pig Latin language.  Apache Pig handles both structured and unstructured data analysis.
  • 15. Apache Pig Applications  Processes large volume of data  Supports quick prototyping and ad-hoc queries across large datasets  Performs data processing in search platforms  Processes time-sensitive data loads  Used by telecom companies to de-identify the user call data information.