Hadoop is an open-source framework for distributed storage and processing of large datasets across clusters of computers. It was developed based on Google papers describing Google File System (GFS) for reliable distributed data storage and MapReduce for distributed parallel processing. Hadoop uses HDFS for storage and MapReduce for processing in a scalable, fault-tolerant manner on commodity hardware. It has a growing ecosystem of projects like Pig, Hive, HBase, Zookeeper, Spark and others that provide additional capabilities for SQL queries, real-time processing, coordination services and more. Major vendors that provide Hadoop distributions include Hortonworks and Cloudera.