SlideShare a Scribd company logo
REAL TIME PROJECT:
Click Stream Data Analytics Report Project
ClickStream Data
ClickStream data could be generated from any activity performed by the user over a web application.
What could be the user activity over any website? For example, I am logging into Amazon, what are the
activities I could perform? In a pattern, I may navigate through some pages; spend some time over certain
pages and click on certain things. All these activities, including reaching that particular page or application,
clicking, navigating from one page to another and spending time make a set of data. All these will be logged
by a web application. This data is known as ClickStream Data. It has a high business value, specific to e-
commerce applications and for those who want to understand their users’ behavior.
More formally, ClickStream can be defined as data about the links that a user clicked, including the
point of time when each one of them were clicked. E-commerce businesses mine and analyse ClickStream
data on their own websites. Most of the E-commerce applications have their built-in system, which mines all
this information.
ClickStream Analytics
Using the ClickStream data adds a lot of value to businesses, through which they can bring many
customers or visitors. It helps them understand whether the application is right, and the application
experience of users is good or bad, based on the navigation patterns that people take. They can also predict
which page you are most likely to visit next and can-do Ad Targeting as well. With this, they can understand
the needs of users and come up with better recommendations. Several other things are possible using the
ClickStream Data.
Project Scope
In this project candidates are given with sample click stream data which is taken from a web
application in a text file along with problem statements.
➱ Users information in MySQL database.
➱ Click stream data in text file generated from Web application.
Each candidate has to come up with high level system architecture design based upon the Hadoop eco
systems covered during the course. Each candidate has to table the High-level system architecture along
with designed eco systems and pros and cons will be discussed with all the other candidates. Finally, will
choose the best possible optimal system design approach for implementation.
Candidates are given instructions to create an oozie work flow with the respective Hadoop Eco systems
finalized based on the discussion. Candidates has to submit the project for the given problem statement and
this will be validated by the trainer individually before course completion.
ECO System involved in click stream analytics Project
➱ HDFS
➱ Sqoop
➱ Pig
➱ Hive
➱ Oozie
Big Data Hadoop Course Content
Chapter 1: Introduction to Big Data-hadoop
➱ Overview of Hadoop Ecosystem
➱ Role of Hadoop in Big Data– Overview of other Big DataSystems
➱ Who is using Hadoop
➱ Hadoop integrations into Exiting Software Products
➱ Current Scenario in Hadoop Ecosystem
➱ Installation
➱ Configuration
➱ Use Cases of Hadoop (HealthCare, Retail,Telecom)
Chapter 2 : HDFS
➱ Concepts
➱ Architecture
➱ Data Flow (File Read , FileWrite)
➱ Fault Tolerance
➱ Shell Commands
➱ Data Flow Archives
➱ Coherency -Data Integrity
➱ Role of Secondary Name Node
Chapter 3 : Mapreduce
➱ Theory
➱ Data Flow (Map – Shuffle –Reduce)
➱ MapRed vs MapReduce APIs
➱ Programming [Mapper, Reducer, Combiner, Partitioner]
➱ Writables
➱ Input Format
➱ Output format
➱ Streaming API using python
➱ Inherent Failure Handling using Speculative Execution
➱ Magic of Shuffle Phase
➱ File Formats
➱ Sequence Files
Chapter 4: Hbase
➱ Introduction to NoSQL
➱ CAP Theorem
➱ Classification of NoSQL
➱ Hbase and RDBMS
➱ HBASE and HDFS
➱ Architecture (Read Path, Write Path, Compactions,Splits)
➱ Installation
➱ Configuration
➱ Role of Zookeeper
➱ HBase Shell Introduction to Filters
➱ Row Key Design -What’s New in HBase HandsOn
Chapter 5 : Hive
➱ Architecture
➱ Installation
➱ Configuration
➱ Hive vs RDBMS
➱ Tables
➱ DDL
➱ DML
➱ UDF
➱ Partitioning
➱ Bucketing
➱ Hive functions
➱ Date functions
➱ String functions
➱ Cast function Meta Store
➱ Joins
➱ Real-time HQL will be shared along with database migrationproject
Chapter 6 : pig
➱ Architecture
➱ Installation
➱ Hive vs Pig
➱ Pig Latin Syntax
➱ Data Types
➱ Functions (Eval, Load/Store, String, Date Time)
➱ Joins
➱ UDFs- Performance
➱ Troubleshooting
➱ Commonly Used Functions
Chapter 7 : sqoop
➱ Architecture , Installation, Commands(Import , Hive-Import, EVal, Hbase Import, Import All
tables, Export)
➱ Connectors to Existing DBs and DW
Practicals
➱ SQOOP to import Real Time Weblogs from application to DB and try to export the sameto
MySQL
Chapter 8 : kafka
➱ Kafka introduction
➱ Data streaming Introduction
➱ Producer-consumer-topics
➱ Brokers
➱ Partitions
➱ Unix Streaming via kafka
Practicals
Kafka
➱ Producer and Subscribers setup and publish a topic from Producer to subscriber
Chapter 9 : oozie
➱ Architecture
➱ Installation
➱ Workflow
➱ Coordinator
➱ Action (Map reduce, Hive, Pig,Sqoop)
➱ Introduction to Bundle
➱ Mail Notifications
Chapter 10: Hadoop 2.0 and spark
➱ Limitations in Hadoop
➱ –HDFS Federation
➱ High Availability in HDFS
➱ HDFS Snapshots
➱ Other Improvements inHDFS2
➱ Introduction to YARN akaMR2
➱ Limitations in MR1
➱ Architecture of YARN
➱ Map Reduce Job Flow inYARN
➱ Introduction to Stinger Initiative andTez
➱ Back Ward Compatibility for Hadoop1.X
➱ Spark Fundamentals
➱ RDD- Sample Scala Program- SparkStreaming
Practicals
➱ Difference between SPARK1.x and SPARK2.x
➱ PySpark program to create word count program in pyspark
Chapter 11: Big Data Use cases
➱ Hadoop
➱ HDFS architecture and usage
➱ MapReduce Architecture and real time exercises
➱ Hadoop Eco systems
➱ Sqoop - mysql Db Migration
➱ Hive. -- Deep drive
➱ Pig - weblog parsing and ETL
➱ Oozie - Workflow scheduling
➱ Flume - weblogs ingestion
➱ No SQL
➱ HBase
➱ Apache Kafka
➱ Pentaho ETL tool integration & working with Hadoop eco system
➱ Apache SPARK
➱ Introduction and working with RDD.
➱ Multi node Setup Guidance
➱ Hadoop latest version Pros & cons discussion
➱ Ends with Introduction of Data science.
Chapter 12: Real Time Project
➱ Getting applications web logs
➱ Getting user information from my sql via sqoop
➱ Getting extracted data from Pig script
➱ Creating Hive SQL Table for querying
➱ Creating Reports from Hive QL

More Related Content

What's hot (20)

PPTX
Data lake-itweekend-sharif university-vahid amiry
datastack
 
PPTX
Introduction to Apache Hadoop Eco-System
Md. Hasan Basri (Angel)
 
PPTX
Sharing bisnis big data v3 part1
Dwika Sudrajat
 
ODP
Hadoop seminar
KrishnenduKrishh
 
PDF
Facebook Hadoop Data & Applications
dzhou
 
PPTX
Big Data on the Microsoft Platform
Andrew Brust
 
PPT
Big data & hadoop framework
Tu Pham
 
PPTX
Big data vahidamiri-datastack.ir
datastack
 
PPTX
PPT on Hadoop
Shubham Parmar
 
PPTX
Hadoop
ABHIJEET RAJ
 
PPTX
Hadoop Tutorial For Beginners
Dataflair Web Services Pvt Ltd
 
PPT
Seminar Presentation Hadoop
Varun Narang
 
PDF
Distributed Crawler Service architecture presentation
Gennady Baranov
 
PPTX
MongoDB
Tharun Srinivasa
 
PPTX
Intro to bigdata on gcp (1)
SahilRaina21
 
PPTX
HADOOP TECHNOLOGY ppt
sravya raju
 
PPTX
Big Data Unit 4 - Hadoop
RojaT4
 
PPTX
Big data technology unit 3
RojaT4
 
PDF
Hadoop Administration pdf
Edureka!
 
PDF
Benchmarking Apache Druid
Matt Sarrel
 
Data lake-itweekend-sharif university-vahid amiry
datastack
 
Introduction to Apache Hadoop Eco-System
Md. Hasan Basri (Angel)
 
Sharing bisnis big data v3 part1
Dwika Sudrajat
 
Hadoop seminar
KrishnenduKrishh
 
Facebook Hadoop Data & Applications
dzhou
 
Big Data on the Microsoft Platform
Andrew Brust
 
Big data & hadoop framework
Tu Pham
 
Big data vahidamiri-datastack.ir
datastack
 
PPT on Hadoop
Shubham Parmar
 
Hadoop
ABHIJEET RAJ
 
Hadoop Tutorial For Beginners
Dataflair Web Services Pvt Ltd
 
Seminar Presentation Hadoop
Varun Narang
 
Distributed Crawler Service architecture presentation
Gennady Baranov
 
MongoDB
Tharun Srinivasa
 
Intro to bigdata on gcp (1)
SahilRaina21
 
HADOOP TECHNOLOGY ppt
sravya raju
 
Big Data Unit 4 - Hadoop
RojaT4
 
Big data technology unit 3
RojaT4
 
Hadoop Administration pdf
Edureka!
 
Benchmarking Apache Druid
Matt Sarrel
 

Similar to Big data-hadoop-training-course-content-content (20)

PPTX
Big dataarchitecturesandecosystem+nosql
Khanderao Kand
 
PDF
Building a Big Data platform with the Hadoop ecosystem
Gregg Barrett
 
PPT
Big Data & Hadoop
Krishna Sujeer
 
PDF
Hadoop Master Class : A concise overview
Abhishek Roy
 
PDF
Big Data
Kirubaburi R
 
PDF
Final White Paper_
Ryan Ellingson
 
PDF
Elasticsearch + Cascading for Scalable Log Processing
Cascading
 
PPTX
Analysis of Major Trends in Big Data Analytics
DataWorks Summit/Hadoop Summit
 
PPTX
Analysis of Major Trends in Big Data Analytics
DataWorks Summit/Hadoop Summit
 
PPTX
Oct 2011 CHADNUG Presentation on Hadoop
Josh Patterson
 
PDF
Infochimps: Cloud for Big Data
inside-BigData.com
 
PPTX
Analysis of historical movie data by BHADRA
Bhadra Gowdra
 
PDF
Hadoop
Veera Sundari
 
PPTX
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Slim Baltagi
 
PDF
Hands-on with Apache Druid: Installation & Data Ingestion Steps
servicesNitor
 
PPTX
Big Data and Hadoop Training in Bangalore by myTectra
myTectra Learning Solutions Private Ltd
 
PDF
Cytoscape: Now and Future
Keiichiro Ono
 
PDF
Data scientist a perfect job
Sidharth Raj Agarwal
 
PDF
Developing Enterprise Consciousness: Building Modern Open Data Platforms
ScyllaDB
 
PDF
Hadoop content
Hadoop online training
 
Big dataarchitecturesandecosystem+nosql
Khanderao Kand
 
Building a Big Data platform with the Hadoop ecosystem
Gregg Barrett
 
Big Data & Hadoop
Krishna Sujeer
 
Hadoop Master Class : A concise overview
Abhishek Roy
 
Big Data
Kirubaburi R
 
Final White Paper_
Ryan Ellingson
 
Elasticsearch + Cascading for Scalable Log Processing
Cascading
 
Analysis of Major Trends in Big Data Analytics
DataWorks Summit/Hadoop Summit
 
Analysis of Major Trends in Big Data Analytics
DataWorks Summit/Hadoop Summit
 
Oct 2011 CHADNUG Presentation on Hadoop
Josh Patterson
 
Infochimps: Cloud for Big Data
inside-BigData.com
 
Analysis of historical movie data by BHADRA
Bhadra Gowdra
 
Hadoop
Veera Sundari
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Slim Baltagi
 
Hands-on with Apache Druid: Installation & Data Ingestion Steps
servicesNitor
 
Big Data and Hadoop Training in Bangalore by myTectra
myTectra Learning Solutions Private Ltd
 
Cytoscape: Now and Future
Keiichiro Ono
 
Data scientist a perfect job
Sidharth Raj Agarwal
 
Developing Enterprise Consciousness: Building Modern Open Data Platforms
ScyllaDB
 
Hadoop content
Hadoop online training
 
Ad

More from Training Institute (10)

DOCX
tell us which cloud you prefer
Training Institute
 
DOCX
Testing
Training Institute
 
PDF
Ui path training-course-content
Training Institute
 
PDF
Selenium training-course-content-syllabus-credo systemz
Training Institute
 
PDF
Python training-course-content
Training Institute
 
PDF
Aws training-course-content
Training Institute
 
PDF
Angular training-course-syllabus
Training Institute
 
PDF
Mean stack training-course-content
Training Institute
 
PDF
Angular training-course-syllabus
Training Institute
 
PDF
Angular webinar - Credo Systemz
Training Institute
 
tell us which cloud you prefer
Training Institute
 
Ui path training-course-content
Training Institute
 
Selenium training-course-content-syllabus-credo systemz
Training Institute
 
Python training-course-content
Training Institute
 
Aws training-course-content
Training Institute
 
Angular training-course-syllabus
Training Institute
 
Mean stack training-course-content
Training Institute
 
Angular training-course-syllabus
Training Institute
 
Angular webinar - Credo Systemz
Training Institute
 
Ad

Recently uploaded (20)

PPTX
Artificial Intelligence in Gastroentrology: Advancements and Future Presprec...
AyanHossain
 
PPTX
Continental Accounting in Odoo 18 - Odoo Slides
Celine George
 
PPTX
Command Palatte in Odoo 18.1 Spreadsheet - Odoo Slides
Celine George
 
PDF
BÀI TáșŹP TEST BỔ TRỹ THEO Tá»ȘNG CHỊ ĐỀ CỊA Tá»ȘNG UNIT KÈM BÀI TáșŹP NGHE - TIáșŸNG A...
Nguyen Thanh Tu Collection
 
PPTX
YSPH VMOC Special Report - Measles Outbreak Southwest US 7-20-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
PPTX
INTESTINALPARASITES OR WORM INFESTATIONS.pptx
PRADEEP ABOTHU
 
PPTX
Applied-Statistics-1.pptx hardiba zalaaa
hardizala899
 
PPTX
The Future of Artificial Intelligence Opportunities and Risks Ahead
vaghelajayendra784
 
PPTX
TOP 10 AI TOOLS YOU MUST LEARN TO SURVIVE IN 2025 AND ABOVE
digilearnings.com
 
PPTX
ENGLISH 8 WEEK 3 Q1 - Analyzing the linguistic, historical, andor biographica...
OliverOllet
 
PPTX
How to Track Skills & Contracts Using Odoo 18 Employee
Celine George
 
PPTX
Basics and rules of probability with real-life uses
ravatkaran694
 
PDF
The-Invisible-Living-World-Beyond-Our-Naked-Eye chapter 2.pdf/8th science cur...
Sandeep Swamy
 
DOCX
pgdei-UNIT -V Neurological Disorders & developmental disabilities
JELLA VISHNU DURGA PRASAD
 
PPTX
Gupta Art & Architecture Temple and Sculptures.pptx
Virag Sontakke
 
PDF
TOP 10 AI TOOLS YOU MUST LEARN TO SURVIVE IN 2025 AND ABOVE
digilearnings.com
 
PPTX
K-Circle-Weekly-Quiz12121212-May2025.pptx
Pankaj Rodey
 
PPTX
Introduction to pediatric nursing in 5th Sem..pptx
AneetaSharma15
 
PDF
Module 2: Public Health History [Tutorial Slides]
JonathanHallett4
 
PPTX
I INCLUDED THIS TOPIC IS INTELLIGENCE DEFINITION, MEANING, INDIVIDUAL DIFFERE...
parmarjuli1412
 
Artificial Intelligence in Gastroentrology: Advancements and Future Presprec...
AyanHossain
 
Continental Accounting in Odoo 18 - Odoo Slides
Celine George
 
Command Palatte in Odoo 18.1 Spreadsheet - Odoo Slides
Celine George
 
BÀI TáșŹP TEST BỔ TRỹ THEO Tá»ȘNG CHỊ ĐỀ CỊA Tá»ȘNG UNIT KÈM BÀI TáșŹP NGHE - TIáșŸNG A...
Nguyen Thanh Tu Collection
 
YSPH VMOC Special Report - Measles Outbreak Southwest US 7-20-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
INTESTINALPARASITES OR WORM INFESTATIONS.pptx
PRADEEP ABOTHU
 
Applied-Statistics-1.pptx hardiba zalaaa
hardizala899
 
The Future of Artificial Intelligence Opportunities and Risks Ahead
vaghelajayendra784
 
TOP 10 AI TOOLS YOU MUST LEARN TO SURVIVE IN 2025 AND ABOVE
digilearnings.com
 
ENGLISH 8 WEEK 3 Q1 - Analyzing the linguistic, historical, andor biographica...
OliverOllet
 
How to Track Skills & Contracts Using Odoo 18 Employee
Celine George
 
Basics and rules of probability with real-life uses
ravatkaran694
 
The-Invisible-Living-World-Beyond-Our-Naked-Eye chapter 2.pdf/8th science cur...
Sandeep Swamy
 
pgdei-UNIT -V Neurological Disorders & developmental disabilities
JELLA VISHNU DURGA PRASAD
 
Gupta Art & Architecture Temple and Sculptures.pptx
Virag Sontakke
 
TOP 10 AI TOOLS YOU MUST LEARN TO SURVIVE IN 2025 AND ABOVE
digilearnings.com
 
K-Circle-Weekly-Quiz12121212-May2025.pptx
Pankaj Rodey
 
Introduction to pediatric nursing in 5th Sem..pptx
AneetaSharma15
 
Module 2: Public Health History [Tutorial Slides]
JonathanHallett4
 
I INCLUDED THIS TOPIC IS INTELLIGENCE DEFINITION, MEANING, INDIVIDUAL DIFFERE...
parmarjuli1412
 

Big data-hadoop-training-course-content-content

  • 1. REAL TIME PROJECT: Click Stream Data Analytics Report Project ClickStream Data ClickStream data could be generated from any activity performed by the user over a web application. What could be the user activity over any website? For example, I am logging into Amazon, what are the activities I could perform? In a pattern, I may navigate through some pages; spend some time over certain pages and click on certain things. All these activities, including reaching that particular page or application, clicking, navigating from one page to another and spending time make a set of data. All these will be logged by a web application. This data is known as ClickStream Data. It has a high business value, specific to e- commerce applications and for those who want to understand their users’ behavior. More formally, ClickStream can be defined as data about the links that a user clicked, including the point of time when each one of them were clicked. E-commerce businesses mine and analyse ClickStream data on their own websites. Most of the E-commerce applications have their built-in system, which mines all this information. ClickStream Analytics Using the ClickStream data adds a lot of value to businesses, through which they can bring many customers or visitors. It helps them understand whether the application is right, and the application experience of users is good or bad, based on the navigation patterns that people take. They can also predict which page you are most likely to visit next and can-do Ad Targeting as well. With this, they can understand the needs of users and come up with better recommendations. Several other things are possible using the ClickStream Data. Project Scope In this project candidates are given with sample click stream data which is taken from a web application in a text file along with problem statements. ➱ Users information in MySQL database. ➱ Click stream data in text file generated from Web application.
  • 2. Each candidate has to come up with high level system architecture design based upon the Hadoop eco systems covered during the course. Each candidate has to table the High-level system architecture along with designed eco systems and pros and cons will be discussed with all the other candidates. Finally, will choose the best possible optimal system design approach for implementation. Candidates are given instructions to create an oozie work flow with the respective Hadoop Eco systems finalized based on the discussion. Candidates has to submit the project for the given problem statement and this will be validated by the trainer individually before course completion. ECO System involved in click stream analytics Project ➱ HDFS ➱ Sqoop ➱ Pig ➱ Hive ➱ Oozie
  • 3. Big Data Hadoop Course Content Chapter 1: Introduction to Big Data-hadoop ➱ Overview of Hadoop Ecosystem ➱ Role of Hadoop in Big Data– Overview of other Big DataSystems ➱ Who is using Hadoop ➱ Hadoop integrations into Exiting Software Products ➱ Current Scenario in Hadoop Ecosystem ➱ Installation ➱ Configuration ➱ Use Cases of Hadoop (HealthCare, Retail,Telecom) Chapter 2 : HDFS ➱ Concepts ➱ Architecture ➱ Data Flow (File Read , FileWrite) ➱ Fault Tolerance ➱ Shell Commands ➱ Data Flow Archives ➱ Coherency -Data Integrity ➱ Role of Secondary Name Node Chapter 3 : Mapreduce ➱ Theory ➱ Data Flow (Map – Shuffle –Reduce) ➱ MapRed vs MapReduce APIs ➱ Programming [Mapper, Reducer, Combiner, Partitioner] ➱ Writables ➱ Input Format ➱ Output format ➱ Streaming API using python ➱ Inherent Failure Handling using Speculative Execution ➱ Magic of Shuffle Phase ➱ File Formats
  • 4. ➱ Sequence Files Chapter 4: Hbase ➱ Introduction to NoSQL ➱ CAP Theorem ➱ Classification of NoSQL ➱ Hbase and RDBMS ➱ HBASE and HDFS ➱ Architecture (Read Path, Write Path, Compactions,Splits) ➱ Installation ➱ Configuration ➱ Role of Zookeeper ➱ HBase Shell Introduction to Filters ➱ Row Key Design -What’s New in HBase HandsOn Chapter 5 : Hive ➱ Architecture ➱ Installation ➱ Configuration ➱ Hive vs RDBMS ➱ Tables ➱ DDL ➱ DML ➱ UDF ➱ Partitioning ➱ Bucketing ➱ Hive functions ➱ Date functions ➱ String functions ➱ Cast function Meta Store ➱ Joins ➱ Real-time HQL will be shared along with database migrationproject Chapter 6 : pig ➱ Architecture ➱ Installation ➱ Hive vs Pig ➱ Pig Latin Syntax ➱ Data Types ➱ Functions (Eval, Load/Store, String, Date Time) ➱ Joins ➱ UDFs- Performance ➱ Troubleshooting
  • 5. ➱ Commonly Used Functions Chapter 7 : sqoop ➱ Architecture , Installation, Commands(Import , Hive-Import, EVal, Hbase Import, Import All tables, Export) ➱ Connectors to Existing DBs and DW Practicals ➱ SQOOP to import Real Time Weblogs from application to DB and try to export the sameto MySQL Chapter 8 : kafka ➱ Kafka introduction ➱ Data streaming Introduction ➱ Producer-consumer-topics ➱ Brokers ➱ Partitions ➱ Unix Streaming via kafka Practicals Kafka ➱ Producer and Subscribers setup and publish a topic from Producer to subscriber Chapter 9 : oozie ➱ Architecture ➱ Installation ➱ Workflow ➱ Coordinator ➱ Action (Map reduce, Hive, Pig,Sqoop) ➱ Introduction to Bundle ➱ Mail Notifications Chapter 10: Hadoop 2.0 and spark ➱ Limitations in Hadoop ➱ –HDFS Federation ➱ High Availability in HDFS ➱ HDFS Snapshots ➱ Other Improvements inHDFS2 ➱ Introduction to YARN akaMR2 ➱ Limitations in MR1 ➱ Architecture of YARN
  • 6. ➱ Map Reduce Job Flow inYARN ➱ Introduction to Stinger Initiative andTez ➱ Back Ward Compatibility for Hadoop1.X ➱ Spark Fundamentals ➱ RDD- Sample Scala Program- SparkStreaming Practicals ➱ Difference between SPARK1.x and SPARK2.x ➱ PySpark program to create word count program in pyspark Chapter 11: Big Data Use cases ➱ Hadoop ➱ HDFS architecture and usage ➱ MapReduce Architecture and real time exercises ➱ Hadoop Eco systems ➱ Sqoop - mysql Db Migration ➱ Hive. -- Deep drive ➱ Pig - weblog parsing and ETL ➱ Oozie - Workflow scheduling ➱ Flume - weblogs ingestion ➱ No SQL ➱ HBase ➱ Apache Kafka ➱ Pentaho ETL tool integration & working with Hadoop eco system ➱ Apache SPARK ➱ Introduction and working with RDD. ➱ Multi node Setup Guidance ➱ Hadoop latest version Pros & cons discussion ➱ Ends with Introduction of Data science. Chapter 12: Real Time Project ➱ Getting applications web logs ➱ Getting user information from my sql via sqoop ➱ Getting extracted data from Pig script ➱ Creating Hive SQL Table for querying ➱ Creating Reports from Hive QL