Spark is a fast and general engine for large-scale data processing. It was developed at UC Berkley in 2009 and can run programs up to 100x faster than Hadoop MapReduce in memory or 10x faster on disk. Spark uses Resilient Distributed Datasets (RDDs) as its basic abstraction, which allow data to be operated on in parallel. The document provides examples of using Spark for word count, SQL queries, and notebooks.