Hadoop is an open-source software framework for distributed storage and processing of large datasets across clusters of computers. It allows for the reliable, scalable, and distributed processing of large data sets across commodity hardware. The core of Hadoop consists of HDFS for storage and MapReduce for processing data in parallel on multiple nodes. The Hadoop ecosystem includes additional projects that extend the functionality of the core components.