This document provides an overview of using OpenStack and Sahara to implement a big data architecture on cloud infrastructure. It discusses:
- The characteristics and service models of cloud computing
- An introduction to OpenStack, why it is used, and some of its key statistics
- What Sahara is and its role in provisioning and managing Hadoop, Spark, and Storm clusters on OpenStack
- Sahara's architecture, how it integrates with OpenStack, and examples of how it can be used to quickly provision data processing clusters and execute analytic jobs on cloud infrastructure.