SlideShare a Scribd company logo
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Agenda for today’s Session
 Entry of Apache Pig
 Pig vs MapReduce
 Twitter Case Study on Apache Pig
 Apache Pig Architecture
 Pig Components
 Pig Data Model & Operators
 Running Pig Commands and Pig Scripts (Log Analysis)
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
MapReduce Way
In MapReduce, you need to write a program
in Java/Python to process the data.
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
What if you are from Non-programming
background!!
Are your Hadoop days over before they
even started? 
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
No need to worry at all!
There are multiple tools in Hadoop
Ecosystem where you do not need
programming background.
And in today’s session, I will tell you about
one such tool!
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Apache PIG
 An open-source high-level dataflow system
 Introduced by Yahoo
 Provides abstraction over MapReduce
 Two main components – the Pig Latin language
and the Pig Execution
Fun Fact:
 10 lines of pig latin= approx. 200 lines of Map-Reduce Java Program
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Why go for PIG when MR is there?
1/20 the lines of Code 1/16 the development Time
180
160
140
120
100
80
60
40
20
0
MapReduce Pig
0
MapReduce Pig
50
100
150
200
250
300
Minutes
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Apache Pig vs MapReduce
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Apache Pig vs MapReduce
 High-level data flow tool
 No need to write complex programs
 Built-in support for data operations
like joins, filters, ordering, sorting etc.
 Provides nested data types like
tuples, bags, and maps
 Low-level data processing paradigm
 You need write programs in
Java/Python etc.
 Performing data operations in
MapReduce is a humongous task
 Nested data types are not there in
MapReduce
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Some more reasons to
choose Apache Pig
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Why Apache Pig?
Provides common data operations filters,
joins, ordering, etc. and nested data
types tuples, bags, and maps missing
from MapReduce.
Open source and actively supported
by a community of developers.
Structured data
Semi-Structured data
Unstructured data
Data Flow Language
Reads like a series
of steps
Java
Python
JavaScript
Ruby
An ad-hoc way of creating and executing
map-reduce jobs on very large data sets
Can take any data
Easy to learn, Easy
to write and Easy to
read
Extensible by UDF
(User Defined Functions)
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Twitter Case Study
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Twitter Case Study
 Twitter’s data was growing at an accelerating rate (i.e.
10 TB/day).
 Thus, Twitter decided to move the archived data to
HDFS and adopt Hadoop for extracting the business
values out of it.
 Their major aim was to analyse data stored in Hadoop to
come up with the multiple insights on a daily, weekly or
monthly basis.
Let me talk about one of the insight they wanted to know.
Analyzing how many tweets are stored per user, in the given tweet tables?
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
High Level Implementation
Twitter
Database
HDFS
Tweet Table
User Table
Put the tables
on HDFS
Load the
data in Pig
Process the data in
Pig and store the
result back on HDFS
1
2
3
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Detailed Implementation Flow
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
1
2
3
4
5
6
7
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Apache Pig Architecture
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Apache Pig Architecture
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Apache Pig Components
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Apache Pig Components
Pig Components
Pig Latin
Pig Execution
Script
Grunt
Embedded
It is made up of a series of operations or transformations
that are applied to the input data to produce output.
Contains Pig commands in a file (.pig)
Interactive shell for running Pig commands
Provisioning pig script in Java
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Apache Pig Running Modes
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Pig Running Modes
You can run
Apache Pig
in 2 modes:
MapReduce Mode – This is the default mode, which requires access to a
Hadoop cluster and HDFS installation. The input and output in this mode
are present on HDFS.
Command: pig
Local Mode – With access to a single machine, all files are installed and
run using a local host and file system. Here the local mode is specified
using ‘-x flag’ (pig -x local). The input and output in this mode are present
on local file system.
Command: pig –x local
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Before going to practical, let us
understand Data Models in Pig
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Pig Data Model
Atom Tuple
MapBag
Data
Model
Types
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Pig Data Model – Tuple and Bag
 Tuple is an ordered set of fields which may
contain different data types for each field.
Example of tuple − (1, Linkin Park, 7, California)
 A Bag is a collection of a set of tuples and these
tuples are subset of rows or entire rows of a
table.
Example of a bag − {(Linkin Park, 7, California),
(Metallica, 8), (Mega Death, Los Angeles)}
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Pig Data Model – Map and Atom
 A Map is key-value pairs used to represent data elements.
Example of maps− [band#Linkin Park, members#7 ], [band#Metallica, members#8 ]
 Atoms are basic data types which are used in all the languages like string, int, float, long, double, char[],
byte[]
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Pig Operators
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Pig Operators
Operator Description
LOAD Load data from the local file system or HDFS storage into Pig
FOREACH Generates data transformations based on columns of data
FILTER Selects tuples from a relation based on a condition
JOIN Join the relations based on the column
ORDER BY Sort a relation based on one or more fields
STORE Save results to the local file system or HDFS
DISTINCT Removes duplicate tuples in a relation
GROUP Groups together the tuples with the same group key (key field)
COGROUP It is same as GROUP. But COGROUP is used when multiple relations re involved
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Let us execute few Pig
commands on grunt shell
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Analysing Logs Using Apache Pig
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Analysing Logs Using Apache Pig
 There is an application which processes
sampleclass recordings.
 Here is a log file which is recording all the
events happening when the application is
running.
 We will analyse this log file to understand
what are the types of event happening in
this log file and the count of each event.
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Create and Run a Pig Script
to Analyze the logs
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Learning Resources
 Hadoop Tutorial: www.edureka.co/blog/hadoop-tutorial
 Pig Tutorial: https://www.edureka.co/blog/pig-tutorial
 Operators in Pig: https://www.edureka.co/blog/operators-in-apache-pig/
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Thank You …
Questions/Queries/Feedback

More Related Content

What's hot (20)

PPTX
Elastic Data Warehousing
Snowflake Computing
 
PPTX
Snowflake Architecture.pptx
chennakesava44
 
PDF
Introduction to Cloud Computing
Animesh Chaturvedi
 
PDF
Introduction to Hadoop
Apache Apex
 
PPTX
Hadoop Tutorial For Beginners
Dataflair Web Services Pvt Ltd
 
PPTX
Introduction to snowflake
Sunil Gurav
 
PPTX
INTRODUCTION TO BIG DATA AND HADOOP
Dr Geetha Mohan
 
PDF
Let’s get to know Snowflake
Knoldus Inc.
 
PPT
Cloud Computing: Hadoop
darugar
 
PPTX
Traditional data warehouse vs data lake
BHASKAR CHAUDHURY
 
PDF
Getting Started with Delta Lake on Databricks
Knoldus Inc.
 
PPT
Cloud deployment models
Ashok Kumar
 
PPTX
NoSQL databases - An introduction
Pooyan Mehrparvar
 
PPTX
NOSQL Databases types and Uses
Suvradeep Rudra
 
PPTX
Big data and Hadoop
Rahul Agarwal
 
PPTX
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Simplilearn
 
PDF
Hadoop YARN
Vigen Sahakyan
 
PPTX
Snowflake essentials
qureshihamid
 
PPTX
Microsoft Data Platform - What's included
James Serra
 
PDF
Big Data Architecture
Guido Schmutz
 
Elastic Data Warehousing
Snowflake Computing
 
Snowflake Architecture.pptx
chennakesava44
 
Introduction to Cloud Computing
Animesh Chaturvedi
 
Introduction to Hadoop
Apache Apex
 
Hadoop Tutorial For Beginners
Dataflair Web Services Pvt Ltd
 
Introduction to snowflake
Sunil Gurav
 
INTRODUCTION TO BIG DATA AND HADOOP
Dr Geetha Mohan
 
Let’s get to know Snowflake
Knoldus Inc.
 
Cloud Computing: Hadoop
darugar
 
Traditional data warehouse vs data lake
BHASKAR CHAUDHURY
 
Getting Started with Delta Lake on Databricks
Knoldus Inc.
 
Cloud deployment models
Ashok Kumar
 
NoSQL databases - An introduction
Pooyan Mehrparvar
 
NOSQL Databases types and Uses
Suvradeep Rudra
 
Big data and Hadoop
Rahul Agarwal
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Simplilearn
 
Hadoop YARN
Vigen Sahakyan
 
Snowflake essentials
qureshihamid
 
Microsoft Data Platform - What's included
James Serra
 
Big Data Architecture
Guido Schmutz
 

Similar to Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka (20)

PDF
Big Data Hadoop Training
stratapps
 
PDF
Apache Pig: A big data processor
Tushar B Kute
 
PPT
lecturte 5. Hgfjhffjyy to the data will be 1.ppt
YashJadhav496388
 
PPTX
Big Data Analytics for Non-Programmers
Edureka!
 
PPTX
Big Data Analytics for Non Programmers
Edureka!
 
PPTX
Apache pig
Sadiq Basha
 
PPTX
Apache Pig
Shashidhar Basavaraju
 
PPTX
Introduction to pig.
Triloki Gupta
 
PPTX
Unit-5 [Pig] working and architecture.pptx
tripathineeharika
 
PDF
43_Sameer_Kumar_Das2
Mr.Sameer Kumar Das
 
PPTX
Apache PIG
Prashant Gupta
 
PDF
Introduction to pig & pig latin
knowbigdata
 
PDF
unit-4-apache pig-.pdf
ssuser92282c
 
PPTX
Unit 4-apache pig
vishal choudhary
 
PDF
Unit V.pdf
KennyPratheepKumar
 
PPTX
Pig workshop
Sudar Muthu
 
PPTX
power point presentation on pig -hadoop framework
bhargavi804095
 
PDF
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Edureka!
 
PDF
06 pig-01-intro
Aasim Naveed
 
Big Data Hadoop Training
stratapps
 
Apache Pig: A big data processor
Tushar B Kute
 
lecturte 5. Hgfjhffjyy to the data will be 1.ppt
YashJadhav496388
 
Big Data Analytics for Non-Programmers
Edureka!
 
Big Data Analytics for Non Programmers
Edureka!
 
Apache pig
Sadiq Basha
 
Introduction to pig.
Triloki Gupta
 
Unit-5 [Pig] working and architecture.pptx
tripathineeharika
 
43_Sameer_Kumar_Das2
Mr.Sameer Kumar Das
 
Apache PIG
Prashant Gupta
 
Introduction to pig & pig latin
knowbigdata
 
unit-4-apache pig-.pdf
ssuser92282c
 
Unit 4-apache pig
vishal choudhary
 
Unit V.pdf
KennyPratheepKumar
 
Pig workshop
Sudar Muthu
 
power point presentation on pig -hadoop framework
bhargavi804095
 
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Edureka!
 
06 pig-01-intro
Aasim Naveed
 
Ad

More from Edureka! (20)

PDF
What to learn during the 21 days Lockdown | Edureka
Edureka!
 
PDF
Top 10 Dying Programming Languages in 2020 | Edureka
Edureka!
 
PDF
Top 5 Trending Business Intelligence Tools | Edureka
Edureka!
 
PDF
Tableau Tutorial for Data Science | Edureka
Edureka!
 
PDF
Python Programming Tutorial | Edureka
Edureka!
 
PDF
Top 5 PMP Certifications | Edureka
Edureka!
 
PDF
Top Maven Interview Questions in 2020 | Edureka
Edureka!
 
PDF
Linux Mint Tutorial | Edureka
Edureka!
 
PDF
How to Deploy Java Web App in AWS| Edureka
Edureka!
 
PDF
Importance of Digital Marketing | Edureka
Edureka!
 
PDF
RPA in 2020 | Edureka
Edureka!
 
PDF
Email Notifications in Jenkins | Edureka
Edureka!
 
PDF
EA Algorithm in Machine Learning | Edureka
Edureka!
 
PDF
Cognitive AI Tutorial | Edureka
Edureka!
 
PDF
AWS Cloud Practitioner Tutorial | Edureka
Edureka!
 
PDF
Blue Prism Top Interview Questions | Edureka
Edureka!
 
PDF
Big Data on AWS Tutorial | Edureka
Edureka!
 
PDF
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Edureka!
 
PDF
Kubernetes Installation on Ubuntu | Edureka
Edureka!
 
PDF
Introduction to DevOps | Edureka
Edureka!
 
What to learn during the 21 days Lockdown | Edureka
Edureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Edureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Edureka!
 
Tableau Tutorial for Data Science | Edureka
Edureka!
 
Python Programming Tutorial | Edureka
Edureka!
 
Top 5 PMP Certifications | Edureka
Edureka!
 
Top Maven Interview Questions in 2020 | Edureka
Edureka!
 
Linux Mint Tutorial | Edureka
Edureka!
 
How to Deploy Java Web App in AWS| Edureka
Edureka!
 
Importance of Digital Marketing | Edureka
Edureka!
 
RPA in 2020 | Edureka
Edureka!
 
Email Notifications in Jenkins | Edureka
Edureka!
 
EA Algorithm in Machine Learning | Edureka
Edureka!
 
Cognitive AI Tutorial | Edureka
Edureka!
 
AWS Cloud Practitioner Tutorial | Edureka
Edureka!
 
Blue Prism Top Interview Questions | Edureka
Edureka!
 
Big Data on AWS Tutorial | Edureka
Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Edureka!
 
Kubernetes Installation on Ubuntu | Edureka
Edureka!
 
Introduction to DevOps | Edureka
Edureka!
 
Ad

Recently uploaded (20)

PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
Biography of Daniel Podor.pdf
Daniel Podor
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
Biography of Daniel Podor.pdf
Daniel Podor
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 

Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka