This document provides an overview of Pig Latin, a data flow language used for analyzing large datasets. Pig Latin scripts are compiled into MapReduce programs that can run on Hadoop. The key points covered include:
- Pig Latin allows expressing data transformations like filtering, joining, grouping in a declarative way similar to SQL. This is compiled into MapReduce jobs.
- It features a rich data model including tuples, bags and nested data to represent complex data structures from files.
- User defined functions (UDFs) allow custom processing like extracting terms from documents or checking for spam.
- The language provides commands like LOAD, FOREACH, FILTER, JOIN to load, transform and analyze data in parallel across