Apache Spark is an open source framework for large-scale data processing. It was originally developed at UC Berkeley and provides fast, easy-to-use tools for batch and streaming data. Spark features include SQL queries, machine learning, streaming, and graph processing. It is up to 100 times faster than Hadoop for iterative algorithms and interactive queries due to its in-memory processing capabilities. Spark uses Resilient Distributed Datasets (RDDs) that allow data to be reused across parallel operations.