This document provides an overview and introduction to Spark, including:
- Spark is a general purpose computational framework that provides more flexibility than MapReduce while retaining properties like scalability and fault tolerance.
- Spark concepts include resilient distributed datasets (RDDs), transformations that create new RDDs lazily, and actions that run computations and return values to materialize RDDs.
- Spark can run on standalone clusters or as part of Cloudera's Enterprise Data Hub, and examples of its use include machine learning, streaming, and SQL queries.