This is the code repository for Mastering Machine Learning with Spark 2.x, published by Packt. It contains all the supporting project files necessary to work through the book from start to finish.
This book gives you access to transform data into actionable knowledge. The book commences by defining machine learning primitives by the MLlib and H2O libraries. You will learn how to use binary classification to detect the Higgs Boson particle in the huge amount of data produced by CERN particle collider or classify daily health activities using ensemble methods.
All of the code is organized into folders. Each folder starts with a number followed by the application name. For example, Chapter02.
The repository includes the following chapters:
- Chapter 2: Detecting Dark Matter - The Higgs-Boson Particle
- Chapter 3: Ensemble Methods for Multi-class Classification
- Chapter 4: Predicting Movie Reviews Using NLP and Spark Streaming
- Chapter 5: Word2Vec for Prediction and Clustering
- Chapter 6: Extracting Patterns from Clickstream Data
- Chapter 7: Graph Analytics with GraphX
- Chapter 8: Lending Club Loan Prediction
Note: Chapter 01 does not contain code.
Code samples provided in this book use Apache Spark 2.1 and its Scala API. Furthermore, we utilize the Sparkling Water package to access the H2O machine learning library. In each chapter, we show how to start Spark using spark-shell, and also how to download the data necessary to run the code. Moreover, each chapter also contains code representing a regular Spark application.
In summary, the basic requirements to run the code provided in this book include:
- Java 8
- Spark 2.1
The project utilizes Gradle as build system. To build it, it is necessary to run:
> ./gradlew build
To list all project, you can use:
> ./gradlew projects
Each individual example can be run in the way it is described in the book, or directly via Gradle, for example:
> ./gradlew :mastering-ml-w-spark-chapter02:run
If you have already purchased a print or Kindle version of this book, you can get a DRM-free PDF version at no cost.
Simply click on the link to claim your free PDF.
https://packt.link/free-ebook/9781785283451