Homologous Apache Spark Clusters Using Nomad with Alex Dadgar

Alex Dadgar
HOMOLOGOUS SPARK
CLUSTERS USING
NOMAD

Alex Dadgar
Team Lead of Nomad
HashiCorp

We’re in a transitional period
• Migrations to cloud
• Monolithic applications toward micro-services
• DevOps and an explosion of tooling
• Rise of cluster schedulers

Big Data is here to stay
• 2014 IDC Research Study
• Compound growth of 42% - near doubling!
• 2010 - 1 ZB
• 2020 - 50 ZB

Cluster Scheduler
• Easy for Developers
• Operationally Simple
• Built for ScaleNomad

example.nomad
# Define our simple redis job
job "redis" {
# Run only in us-east-1
datacenters = ["us-east-1"]
# Define the single redis task using Docker
task "redis" {
driver = "docker"
config {
image = "redis:latest"
}
resources {
cpu = 500 # Mhz
memory = 256 # MB
network {
mbits = 10
port “redis” {}
}
}
}
}

Job Specification
Declares what to run

Job Specification
Nomad determines where and
manages how to run

Job Specification
Powerful yet simple

# Define our simple redis job
job "redis" {
# Run only in us-east-1
datacenters = ["us-east-1"]
# Define the single redis task using Docker
task "redis" {
driver = "docker"
config {
image = "redis:latest"
}
resources {
cpu = 500 # Mhz
memory = 256 # MB
network {
mbits = 10
port “redis” {}
}
}
}
}

Containerized
Virtualized
Standalone
Qemu / KVM
Java Jar
Static Binaries
Rkt
LXC
Docker

Containerized
Virtualized
Standalone
Docker Windows Server Containers
Qemu / KVM
Hyper-V
Xen
Java Jar
Static Binaries
C#
Rkt
LXC

Single Region Architecture
SERVER SERVER SERVER
CLIENT CLIENT CLIENTDC1 DC2 DC3
FOLLOWER LEADER FOLLOWER
REPLICATION
FORWARDING
REPLICATION
FORWARDING
RPC RPC RPC

Built on Experience
GOSSIP CONSENSUS
Mature Libraries Proven Design Patterns

Built on Research
GOSSIP CONSENSUS

Nomad
Million Container
Challenge
1,000 Jobs
1,000 Tasks per Job
5,000 Hosts on GCE
1,000,000 Containers

-Bill Gates
“640 KB ought to be enough for anybody.

2nd Largest Hedge Fund
18K Cores
5 Hours
2,200 Containers/second

Today and in the Future
Benefits to Deploying on Nomad

Today: Uncompromised Spark
• ./spark-submit workflow remains
• Supports: Scala, Java, R, Python
• Support Spark Shell
• Dynamic Executors
• Run with or without Docker

Terminal
$ ./bin/spark-submit
--master nomad
--docker-image hashicorp/spark-nomad
--distribution local:///opt/spark
--class org.apache.spark.examples.SparkPi
local:/opt/spark/examples/jars/spark-examples_2.11-2.1.1.jar
10

Today: Shared Batch/Service Cluster
• No separate Spark cluster and Service cluster
• Higher density and reduced cost
• Operators manage one infrastructure
• Developers learn one tool

Google Borg Paper
Figure 5 shows that
segregating prod and
non-prod work
would need 20–30%
more machines in
the median cell to
run our workload.

Today: Security
• Integration with HashiCorp Vault
• Vault stores static secrets and can generate
dynamic secrets
• IAM credentials
• Don’t bake secrets into Spark jobs

HASHICORP
task “payment-api" {
…
vault {
policies = [“s3_user_data_rw”]
}
template {
data = <<END
{{with $secret := vault "aws/creds/deploy" }}
AWS_SECRET_ACCESS_KEY={{$secret.Data.access_key}}
AWS_ACCESS_KEY_ID={{$secret.Data.secret_key}}
{{end}}
<<END
dest = “secrets/aws_creds”
env = true
}
}

Today: Security
• Went to great lengths to minimize the exposure of Vault
token
• Servers never see token
• One time access (can detect tampering)
• Write to in-memory file (tmpfs)
• Full talk: https://youtu.be/4gAYyAA6h9E

Today: Multi Region/DC
SERVER SERVER SERVER
FOLLOWER LEADER FOLLOWER
REPLICATION
FORWARDING
REPLICATION
REGION B GOSSIP
REPLICATION REPLICATION
FORWARDING
REGION FORWARDING
REGION A
SERVER
FOLLOWER
SERVER SERVER
LEADER FOLLOWER

Today: Cron Spark Jobs
• Run a Spark Job on a cron schedule
• Responsibility of Nomad to manage and launch
the job
• Higher Reliability

Today: Templated Spark Jobs
• Spark Submit can take a Nomad Job file as a
template
• Merges generated Spark job
• Fully customizable

Today: Templated Spark Jobs
• Run a logging sidecar to ship Spark logs
• Retrieve secrets securely from Vault
• Register Spark jobs in service discovery
• Customize any Nomad tunable

job "template" {
group "driver" {
task "driver" {
meta { "spark.nomad.role" = "driver"}
}
task "log-forwarding-sidecar" {
# sidecar task definition here
}
}
group "executor" {
task "executor" {
meta { "spark.nomad.role" = “executor" }
}
task "log-forwarding-sidecar" {
# sidecar task definition here
}
}
}

Terminal
$ ./bin/spark-submit
--class org.apache.spark.examples.SparkPi
--master nomad
--docker-image hashicorp/spark-nomad
--distribution local:///opt/spark
--conf spark.nomad.job.template=template.json
local:/opt/spark/examples/jars/spark-examples_2.11-2.1.1.jar
10

job "template" {
group "executor" {
task "executor" {
meta { "spark.nomad.role" = “executor" }
template {
data = <<END
{{with $secret := vault "aws/creds/deploy" }}
AWS_SECRET_ACCESS_KEY={{$secret.Data.access_key}}
AWS_ACCESS_KEY_ID={{$secret.Data.secret_key}} {{end}}
<<END
dest = “secrets/aws_creds”
env = true
}
vault {
policies = [“s3-mydata-rw”]
}
}
}
}

Future: Pre-emption
• Job Priorities: 0-100
• Run critical services at higher priority
• Run Spark Driver at higher priority than executors
• Preempt lower priority Spark Executors
• Still make progress

Future: Quotas and Chargebacks
• Enable multi-tenant clusters
• Gate job-submission based on quota
• Control hogging of cluster
• Fine-grain chargebacks

Future: GPU
• Speed up Machine-Learning Tasks
• Nomad Clients detect GPUs
• Spark jobs can annotate desire to run on GPU
machines
• Other tasks on host won’t have access to GPU

Future: Over-Subscription
• Jobs declare their resource requirement
• Often don’t use all of it
• Ask for 4 GB of Memory and use 1 GB
• Detect unused resource and make available to
batch jobs

Play with it!
• PR is out: https://github.com/apache/spark/pull/
18209
• Docker Image: https://hub.docker.com/r/
hashicorp/spark-nomad/

Thank You.
Twitter: @adadgar
GitHub: @dadgar

Homologous Apache Spark Clusters Using Nomad with Alex Dadgar

More Related Content

What's hot (20)

Similar to Homologous Apache Spark Clusters Using Nomad with Alex Dadgar (20)

More from Databricks (20)

Recently uploaded (20)

Homologous Apache Spark Clusters Using Nomad with Alex Dadgar