The document discusses big data and Hadoop. It notes that big data is characterized by volume, variety, velocity, and veracity. Hadoop is an open-source platform for distributed storage and processing of large datasets across clusters of commodity hardware. Hadoop consists of HDFS for storage and MapReduce as a programming model. Limitations of Hadoop 1.x include lack of horizontal scalability and high availability, while Hadoop 2.x addresses these with features like HDFS federation and YARN to support multiple workloads.
Related topics: