Apache Spark is an open-source cluster computing framework for fast and large-scale data processing. It uses an in-memory data abstraction called resilient distributed datasets (RDDs) that allow parallel operations on large datasets across a cluster. Spark also provides APIs in Java, Scala, Python and R for interactive data analysis through its core engine as well as high-level libraries for SQL, streaming, machine learning and graph processing.