This document provides an introduction to big data analytics and Hadoop. It discusses:
1) The characteristics of big data including scale, complexity, and speed of data generation. Big data requires new techniques and architectures to manage and extract value from large, diverse datasets.
2) An overview of Hadoop, an open-source framework for distributed storage and processing of large datasets across clusters of computers. Hadoop includes the Hadoop Distributed File System (HDFS) and MapReduce programming model.
3) The course will teach students how to manage large datasets with Hadoop, write jobs in languages like Java and Python, and use tools like Pig, Hive, RHadoop and Mahout to perform advanced analytics on