This document presents Resilient Distributed Datasets (RDDs), a fault-tolerant abstraction for in-memory cluster computing introduced by Spark. RDDs allow programmers to perform iterative and interactive computations over large datasets in a fault-tolerant manner. RDDs are distributed immutable collections of records that can be operated on through transformations and actions. They track the lineage of transformations to allow recovering lost data partitions. This provides an efficient abstraction for iterative algorithms compared to MapReduce.