Lambda architecture: from zero to One

Lambda Architecture:
from zero to One
Serhiy Masyutin

Me
• Staff Engineer @ Lohika
• Passionate Developer
• Father
• Mountain Biker

Agenda
• Project Overview
• Architecture Evolution
• What is Lambda Architecture?
• Cluster Evolution
• What We Achieved?

Project Goals
• Portfolio-driven R&D project
• Focus on Technology
• Focus on Knowledge
• Focus on a new remote Team

Service designed to offload highly
concurrent scenario of live voting

Service designed to offload highly
concurrent scenario of live voting
• User puts a vote
• User requests results on campaign
• Manager requests reports on campaigns
• Admin controls the system

Architecture Goals
• SaaS Solution
• High Throughput
• Scalability
• Low Latency

Essential Data Model
• campaign { startDate, endDate }
• vote { user, campaign, timestamp }

Start Simple
Java 8
Spring Boot 1.2.5
MariaDB 5.5
Angularjs 1.4

Benchmark it!
• Simple throughout scenario:
user.vote()
user.request(results)
• Stop tests when error rate raises above 5%
• Benchmark tool runs locally, targeting could server

Gatling
• An open-source load testing framework based on
Scala, Akka and Netty
• High performance
• Out-of-box HTTP support
• Ready-to-present HTML reports
• Scenario recorder and developer-friendly DSL
http://gatling.io

Gatling
scenario(“Throughout simulation").repeat(repeatCount) {
feed(voteFeeder())
.exec(http("Vote")
.post(voteLink)
.headers(sentHeaders).header("Authorization", token)
.body(StringBody("${vote}"))
.check(status.is(200)).asJSON)
.exec(http("Report")
.get(reportByOptionLink+"/${votingSchemaId}")
.headers(sentHeaders).header("Authorization", token)
.check(status.is(200)).asJSON)
}

Benchmark!
100
325
550
775
1000
2000 4000 6000 8000 10000 12000
Requestspersecond
Number of concurrent users
Initial Initial no-joins

Kafka
• Publisher-subscriber
• Distributed by design
• Scalable
• Fast
• Durable
http://kafka.apache.org

Benchmark!
100
325
550
775
1000
2000 4000 6000 8000 10000 12000
Requestspersecond
Initial
Initial no-joins
Incoming Queue

Redis
• In-memory data structure store (set, map, etc)
• Easy leader board implementation
• HyperLogLog is its native data structure
http://redis.io

In-memory Storage
Votes
Reports

Benchmark!
100
325
550
775
1000
2000 4000 6000 8000 10000 12000
Requestspersecond
Initial
Initial no-joins
Incoming Queue
In-memory Storage

• A fast and general engine for large-scale data
processing
http://spark.apache.org

Scalable Processing
Votes
Reports

TODO: Benchmarks
• Processing latency
• Latency vs Data Volume

TODO: Scalable Storage
Reports
Votes

Architecture Goals Met
• High Throughput
• Scalable Storage
• Scalable Processing
• Extensible Processing
• Low Latency Reads & Updates

A Single Picture
http://lambda-architecture.net/img/la-overview_small.png

A Single Picture
QUERY = f_query(batch_view, realtime_view)
batch_view = f_batch(all_data)
realtime_view = f_speed(new_data, realtime_view)

Batch Layer
• Immutable append-only data store
• Batch computations produce batch views

Serving Layer
• Random reads/queries on batch views
• Batch updates from batch layer
• No need for random writes

Batch + Serving Layer
• Robustness and fault tolerance
• Scalability
• Generalization
• Extensibility
• Minimal maintenance
• Debuggability

Speed Layer
• Low latency reads and updates
• Incremental computation (different from batch one)
• Scalability
• Fault tolerance
• Minimal amount of stored data

Goals
• Robustness and fault tolerance
• Scalability
• Generalization
• Extensibility
• Minimal maintenance
• Debuggability
• Low latency reads and updates

Lambda Architectrue
http://lambda-architecture.net/img/la-overview_small.png

Optimization:
Tomcat Connector
• Start with a single machine
• Number of threads matter, benchmark it
• Fine-tuning can be OS specific

Benchmark!
100
325
550
775
1000
2000 4000 6000 8000 10000 12000
Requestspersecond
Initial
Initial no-joins
Incoming Queue
In-memory Storage
???

Haproxy
• The Reliable, High Performance TCP/HTTP Load
Balancer
• A single-process program
http://haproxy.org

Optimization:
Load Balancing
0
0.25
0.5
0.75
1
1.25
0
10000
20000
30000
40000
dev 1 2 3 6
Gainbyaddinganotherserver
Requestspersecond
Number of servers
requests per second
scaling factor

When to Stop?
CPU %
Memory
GB
haproxy 95 2.5
tomcat 397 6
kafka 1 1.3
redis 55 3.5

Experience
• Lambda Architecture: we have One
• Cluster Scaling & Optimization
• Excellent team

Technology
Java 8
Spring Boot 1.2.5
Spring Data 1.2.5
Tomcat 8
MariaDB 5.5
Haproxy 1.5.14
Kafka 0.8
Redis 2.8
Spark 1.4
HDFS 2.6
Gatling 2.2
Angularjs 1.4

Things That Matter
• Small steps make huge difference
• Choose right metrics
• Benchmark
• Optimize!

Lambda architecture: from zero to One

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Lambda architecture: from zero to One (20)

Recently uploaded (20)

Lambda architecture: from zero to One

Editor's Notes