This document provides an introduction to Apache Spark and collaborative filtering. It discusses big data and the limitations of MapReduce, then introduces Apache Spark including Resilient Distributed Datasets (RDDs), transformations, actions, and DataFrames. It also covers Spark Machine Learning (ML) libraries and algorithms such as classification, regression, clustering, and collaborative filtering.