This document provides an overview of Hadoop, including:
1. Hadoop is an open-source software framework for distributed storage and processing of large datasets across clusters of commodity hardware.
2. It describes the architecture of Hadoop, including the Hadoop Distributed File System (HDFS) and MapReduce engine. HDFS uses a master/slave architecture with a NameNode and DataNodes, while MapReduce uses a JobTracker and TaskTrackers.
3. It discusses some common uses of Hadoop in industry, such as for log processing, web search indexing, and ad-hoc queries at large companies like Yahoo, Facebook, and Amazon.