Big data refers to massive amounts of structured and unstructured data that is difficult to process using traditional methods due to its large volume, velocity, or variety. While often used to describe volume, big data can also refer to the technologies needed to handle large data. An example is petabytes or exabytes of data from various sources about millions of people. The document then provides steps to run a word count program using Hadoop on a Hortonworks sandbox virtual machine.