This document provides an overview of Apache Hadoop, including its history, architecture, and key components. Hadoop is an open-source software framework for distributed storage and processing of large datasets across clusters of commodity servers. It allows for the distributed processing of large data sets across clusters of computers using a simple programming model. The document outlines Hadoop's origins from Google's paper on MapReduce and the GFS file system. It describes Hadoop's core components - the Hadoop Distributed File System (HDFS) for storage and MapReduce for distributed processing. Use cases for Hadoop including log analysis, search, and analytics are also mentioned.