SlideShare a Scribd company logo
Daniel Hochman, Engineer
@
Geospatial Indexing at Scale
The 10 15 Million QPS Redis Architecture Powering Lyft
Agenda
Case Study: scaling a geospatial index
- Original Lyft architecture
- Migrating to Redis
- Iterating on data model
Redis on the Lyft platform
- Service integration
- Operations and monitoring
- Capacity planning
- Open source work and roadmap
Q&A
API
Monolith
PUT /v1/locations
5 sec
loop
{
"lat": 37.61,
"lng": -122.38
}
{
"drivers": [ ... ],
"primetime": 0,
"eta": 30,
...
}
200 OK
Index on users collection
(driver_mode, region)
Lyft backend in 2012
Location stored on user record
Monolithic architecture issues
Global write lock, not shardable, difficult refactoring, region boundary issues
drivers_in_region = db.users.find(
driver_mode: {$eq: True},
region: {$eq: "SFO"},
ride_id: {$eq: None}
)
eligible_drivers = sort_and_filter_by_distance(
drivers_in_region, radius=.5
)
dispatch(eligible_drivers[0])
Horizontally scalable
and highly available
from day zero
Unwritten architecture rule at Lyft
nutcracker
locations_cluster:
listen: locations.sock
distribution: ketama
hash: md5
eject_hosts: true
failure_limit: 3
servers:
- 10.0.0.1:6379:1
- 10.0.0.2:6379:1
...
- 10.0.0.255:6379:1
Redis Cluster
.
.
.
n shards
Application Instance
Nutcracker overview
Ketama provides consistent hashing!
Lose a node, only lose 1/n data
nutcracker
Pipelining
PIPELINE (
HGETALL foo
SET hello world
INCR lyfts
)
RESPONSE (
(k1, v1, k2, v2)
OK
12121986
)
md5(foo) % 3 = 2
md5(hello) % 3 = 0
md5(lyfts) % 3 = 2
0: SET hello world
0
1
2
2: PIPELINE (
HGETALL foo
INCR lyfts
)
return ordered_results
1. hash the keys
2. send concurrent requests to backends
3. concatenate and return results
location = json.dumps({'lat': 23.2, 'lng': -122.3, 'ride_id': None})
with nutcracker.pipeline() as pipeline:
pipeline.set(user_id, location) # string per user
if driver_mode is True:
pipeline.hset(region_name, user_id, location) # hash per region
Parity data model
Fetching inactive drivers when doing HGETALL. Network and serialization
overhead. The obvious fix? Expiration.
Hash expiration
- pipeline.hset(region_name, user_id, location) # hash per region
# hash per region per 30 seconds
bucket = now_seconds() - (now_seconds() % 30)
hash_key = '{}_{}'.format(region_name, bucket)
pipeline.hset(hash_key, user_id, location)
pipeline.expire(hash_key, 15)
Expiring a hash
HGETALL current bucket plus next and previous to handle clock drift and boundary
condition, merge in process.
12:00:00 12:00:30 12:01:00 12:01:30 ...
Region is a poor hash key (hot shards)
Bulk expiration blocks Redis for longer than expected
Redis used for inter-service communication with new dispatch system
Growing pains
Let's fix it!
Redis is used for inter-service communication
- Replicate existing queries and writes in new service
- Replace function calls that query or write to Redis with calls to new service
- With contract in place between existing services and new service, refactor data model
- Migrate relevant business logic
Proper service communication
fd
a
Region is a poor hash key
Geohashing is an algorithm that provides arbitrary precision with gradual precision degradation
Geohashing
b c
d e f
ihg
>>> compute_geohash(lat=37.7852, lng=-122.4044, level=9)
9q8yywefd
fa fb fc
fe ff
fifhfg
Google open-sourced an alternative geohashing algorithm, S2
loc = {'lat': 23.2, 'lng': -122.3, 'ride_id': None}
geohash = compute_geohash(loc['lat'], loc['lng'], level=5)
with nutcracker.pipeline() as pipeline:
# string per user
pipeline.set(user_id, json.dumps(loc))
# sorted set per geohash with last timestamp
if driver_mode is True:
pipeline.zset(geohash, user_id, now_seconds())
pipeline.zremrangebyscore(geohash, -inf, now_seconds() - 30) # expire!
Data model with geohashing
Sorted set tells you where a driver might be. Use string as source of truth.
On query, look in neighboring geohashes based on desired radius.
Why not GEO?
Stable release in May 2016 with Redis 3.2.0
Point-in-radius, position of key
Uses geohashing and a sorted set under-the-hood
No expiration or sharding
No metadata storage
Great for prototyping
Much more data model complexity behind the scenes
- Additional metadata
- Writing to multiple indices to lower cost of high frequency queries
- Balancing scattered gets and hot shards with geohash level
GEOADD
GEODIST
GEOHASH
GEOPOS
GEORADIUS
GEORADIUSBYMEMBER
Redis on the Lyft platform
Creating a new cluster is a self-service process that takes
less than one engineer hour.
2015: 1 cluster of 3 instances
2017: 50 clusters with a total of 750 instances
Cluster creation
Golang and Python are the two officially supported backend languages at Lyft
Python features
- Fully compatible with redis-py StrictRedis
- Stats
- Retry
- Pipeline management for interleave and targeted retry
from lyftredis import NutcrackerClient
redis_client = NutcrackerClient('locations')
Internal libraries
{% macro redis_cluster_stats(redis_cluster_name, alarm_thresholds) -%}
Observability
Capacity planning
Combine APIs and stats for global capacity plan
- Difficult to track 50 clusters
- Google Sheets API for display
- Internal stats for actual usage, EC2 API for capacity
- Automatically determine resource constraint (CPU, memory, network)
- Currently aim for 4x headroom due to difficulty of cluster resize
- At-a-glance view of peak resource consumption, provisioned resources, cost, resource constraint
Object serialization
Benefits
- Lower memory consumption, I/O
- Lower network I/O
- Lower serialization cost to CPU
708 bytes
69%
1012 bytes
(original)
190 bytes
18%
Nutcracker issues
- Deprecated internally at Twitter and unmaintained
- Passive health checks
- No hot restart (config changes cause downtime)
- Difficult to extend (e.g. to integrate with service discovery)
- When EC2 instance of Redis dies, we page an engineer to fix the problem
The road ahead
Open-source C++11 service mesh and edge proxy
As an advanced load balancer, provides:
- Service discovery integration
- Retry, circuit breaking, rate limiting
- Consistent hashing
- Active health checks
- Stats, stats, stats
- Tracing
- Outlier detection, fault injection
Envoy was designed to be extensible!
What is Envoy?
envoy
by
In production at Lyft as of May 2017
Support for INCR, INCRBY, SET, GET (ratelimit service)
Service discovery integration (autoscaling!)
Active healthcheck using PING (self-healing!)
Pipeline splitting, concurrency
Basic stats
Introducing Envoy Redis
Envoy Redis Roadmap
Additional command support
Error handling
Pipeline management features
Performance optimization
Replication
- Failover
- Mitigate hot read shard issues with large objects
- Zone local query routing
- Quorum
- Protocol overloading? e.g. SET key value [replication factor] [timeout]
More!
Q&A
- Thanks!
- Email technical inquiries to dhochman lyft.com
- Participate in Envoy open source community! lyft/envoy
- Lyft is hiring. If you want to work on scaling problems in a fast-moving, high-growth
company visit https://www.lyft.com/jobs

More Related Content

What's hot (20)

PDF
Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...
Daniel Hochman
 
PPTX
Introduction to Redis
Maarten Smeets
 
PDF
MongoDB WiredTiger Internals: Journey To Transactions
Mydbops
 
PDF
Hudi architecture, fundamentals and capabilities
Nishith Agarwal
 
PPTX
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
PDF
Introduction to Redis
Dvir Volk
 
PPTX
Flink Streaming
Gyula Fóra
 
PDF
Apache Calcite (a tutorial given at BOSS '21)
Julian Hyde
 
PDF
Incremental Processing on Large Analytical Datasets with Prasanna Rajaperumal...
Databricks
 
PPTX
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
PDF
Spark (Structured) Streaming vs. Kafka Streams
Guido Schmutz
 
PDF
Supporting Over a Thousand Custom Hive User Defined Functions
Databricks
 
PDF
Building large scale transactional data lake using apache hudi
Bill Liu
 
PDF
Iceberg: A modern table format for big data (Strata NY 2018)
Ryan Blue
 
PDF
Apache Kafka Architecture & Fundamentals Explained
confluent
 
ODP
Stream processing using Kafka
Knoldus Inc.
 
PDF
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
Databricks
 
PDF
HBase Application Performance Improvement
Biju Nair
 
PPTX
How we solved Real-time User Segmentation using HBase
DataWorks Summit
 
PPTX
HBase Low Latency
DataWorks Summit
 
Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...
Daniel Hochman
 
Introduction to Redis
Maarten Smeets
 
MongoDB WiredTiger Internals: Journey To Transactions
Mydbops
 
Hudi architecture, fundamentals and capabilities
Nishith Agarwal
 
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
Introduction to Redis
Dvir Volk
 
Flink Streaming
Gyula Fóra
 
Apache Calcite (a tutorial given at BOSS '21)
Julian Hyde
 
Incremental Processing on Large Analytical Datasets with Prasanna Rajaperumal...
Databricks
 
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
Spark (Structured) Streaming vs. Kafka Streams
Guido Schmutz
 
Supporting Over a Thousand Custom Hive User Defined Functions
Databricks
 
Building large scale transactional data lake using apache hudi
Bill Liu
 
Iceberg: A modern table format for big data (Strata NY 2018)
Ryan Blue
 
Apache Kafka Architecture & Fundamentals Explained
confluent
 
Stream processing using Kafka
Knoldus Inc.
 
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
Databricks
 
HBase Application Performance Improvement
Biju Nair
 
How we solved Real-time User Segmentation using HBase
DataWorks Summit
 
HBase Low Latency
DataWorks Summit
 

Similar to RedisConf17 - Lyft - Geospatial at Scale - Daniel Hochman (20)

PDF
Redis at Lyft: 2,000 Instances and Beyond
Daniel Hochman
 
PDF
RedisConf18 - 2,000 Instances and Beyond
Redis Labs
 
PDF
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
Redis Labs
 
PPTX
Apache geode
Yogesh BG
 
PPTX
Why We Love Redis by Amit Agrawal CTO of sRide - Redis Day Bangalore 2020
Redis Labs
 
PDF
Presentacion redislabs-ihub
ssuser9d7c90
 
PDF
Redis as a Cache Boosting Performance and Scalability
Inexture Solutions
 
PPTX
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Redis Labs
 
PDF
"Stateful app as an efficient way to build dispatching for riders and drivers...
Fwdays
 
PDF
Paris container day june17
Paris Container Day
 
PDF
Tales Of The Black Knight - Keeping EverythingMe running
Dvir Volk
 
PPTX
Moving Beyond Cache by Yiftach Shoolman - Redis Day Bangalore 2020
Redis Labs
 
PPTX
10 Ways to Scale Your Website Silicon Valley Code Camp 2019
Dave Nielsen
 
PPTX
Introduction to Redis
Arnab Mitra
 
PDF
An Introduction to Redis for Developers.pdf
Stephen Lorello
 
PPTX
RedisConf17 - Redfin - The Real Estate Brokerage and the In-memory Database
Redis Labs
 
PDF
Global NOSQL Benchmark :: DB Community Meetup Jan 10, 2024
Filipe Oliveira
 
PPTX
Real-time GeoSearching at Scale with RediSearch by Apoorva Gaurav and Ronil M...
Redis Labs
 
PDF
Fun with Ruby and Redis
javier ramirez
 
ODP
Random tips that will save your project's life
Mariano Iglesias
 
Redis at Lyft: 2,000 Instances and Beyond
Daniel Hochman
 
RedisConf18 - 2,000 Instances and Beyond
Redis Labs
 
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
Redis Labs
 
Apache geode
Yogesh BG
 
Why We Love Redis by Amit Agrawal CTO of sRide - Redis Day Bangalore 2020
Redis Labs
 
Presentacion redislabs-ihub
ssuser9d7c90
 
Redis as a Cache Boosting Performance and Scalability
Inexture Solutions
 
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Redis Labs
 
"Stateful app as an efficient way to build dispatching for riders and drivers...
Fwdays
 
Paris container day june17
Paris Container Day
 
Tales Of The Black Knight - Keeping EverythingMe running
Dvir Volk
 
Moving Beyond Cache by Yiftach Shoolman - Redis Day Bangalore 2020
Redis Labs
 
10 Ways to Scale Your Website Silicon Valley Code Camp 2019
Dave Nielsen
 
Introduction to Redis
Arnab Mitra
 
An Introduction to Redis for Developers.pdf
Stephen Lorello
 
RedisConf17 - Redfin - The Real Estate Brokerage and the In-memory Database
Redis Labs
 
Global NOSQL Benchmark :: DB Community Meetup Jan 10, 2024
Filipe Oliveira
 
Real-time GeoSearching at Scale with RediSearch by Apoorva Gaurav and Ronil M...
Redis Labs
 
Fun with Ruby and Redis
javier ramirez
 
Random tips that will save your project's life
Mariano Iglesias
 
Ad

More from Redis Labs (20)

PPTX
Redis Day Bangalore 2020 - Session state caching with redis
Redis Labs
 
PPTX
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
Redis Labs
 
PPTX
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
Redis Labs
 
PPTX
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
Redis Labs
 
PPTX
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
Redis Labs
 
PPTX
Redis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
Redis Labs
 
PPTX
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
Redis Labs
 
PPTX
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
Redis Labs
 
PPTX
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
Redis Labs
 
PPTX
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
Redis Labs
 
PPTX
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Redis Labs
 
PPTX
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
Redis Labs
 
PPTX
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
Redis Labs
 
PPTX
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
Redis Labs
 
PPTX
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
Redis Labs
 
PPTX
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
Redis Labs
 
PPTX
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
Redis Labs
 
PDF
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
Redis Labs
 
PPTX
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...
Redis Labs
 
PPTX
Redis as a High Scale Swiss Army Knife by Rahul Dagar and Abhishek Gupta of G...
Redis Labs
 
Redis Day Bangalore 2020 - Session state caching with redis
Redis Labs
 
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
Redis Labs
 
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
Redis Labs
 
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
Redis Labs
 
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
Redis Labs
 
Redis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
Redis Labs
 
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
Redis Labs
 
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
Redis Labs
 
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
Redis Labs
 
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
Redis Labs
 
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Redis Labs
 
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
Redis Labs
 
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
Redis Labs
 
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
Redis Labs
 
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
Redis Labs
 
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
Redis Labs
 
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
Redis Labs
 
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
Redis Labs
 
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...
Redis Labs
 
Redis as a High Scale Swiss Army Knife by Rahul Dagar and Abhishek Gupta of G...
Redis Labs
 
Ad

Recently uploaded (20)

PDF
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
PDF
Per Axbom: The spectacular lies of maps
Nexer Digital
 
PDF
NewMind AI Weekly Chronicles – July’25, Week III
NewMind AI
 
PDF
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
PPTX
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
Market Insight : ETH Dominance Returns
CIFDAQ
 
PDF
Generative AI vs Predictive AI-The Ultimate Comparison Guide
Lily Clark
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PDF
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
Ivan Ruchkin
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PPTX
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
Per Axbom: The spectacular lies of maps
Nexer Digital
 
NewMind AI Weekly Chronicles – July’25, Week III
NewMind AI
 
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
Market Insight : ETH Dominance Returns
CIFDAQ
 
Generative AI vs Predictive AI-The Ultimate Comparison Guide
Lily Clark
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
Ivan Ruchkin
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 

RedisConf17 - Lyft - Geospatial at Scale - Daniel Hochman

  • 1. Daniel Hochman, Engineer @ Geospatial Indexing at Scale The 10 15 Million QPS Redis Architecture Powering Lyft
  • 2. Agenda Case Study: scaling a geospatial index - Original Lyft architecture - Migrating to Redis - Iterating on data model Redis on the Lyft platform - Service integration - Operations and monitoring - Capacity planning - Open source work and roadmap Q&A
  • 3. API Monolith PUT /v1/locations 5 sec loop { "lat": 37.61, "lng": -122.38 } { "drivers": [ ... ], "primetime": 0, "eta": 30, ... } 200 OK Index on users collection (driver_mode, region) Lyft backend in 2012 Location stored on user record
  • 4. Monolithic architecture issues Global write lock, not shardable, difficult refactoring, region boundary issues drivers_in_region = db.users.find( driver_mode: {$eq: True}, region: {$eq: "SFO"}, ride_id: {$eq: None} ) eligible_drivers = sort_and_filter_by_distance( drivers_in_region, radius=.5 ) dispatch(eligible_drivers[0])
  • 5. Horizontally scalable and highly available from day zero Unwritten architecture rule at Lyft
  • 6. nutcracker locations_cluster: listen: locations.sock distribution: ketama hash: md5 eject_hosts: true failure_limit: 3 servers: - 10.0.0.1:6379:1 - 10.0.0.2:6379:1 ... - 10.0.0.255:6379:1 Redis Cluster . . . n shards Application Instance Nutcracker overview Ketama provides consistent hashing! Lose a node, only lose 1/n data
  • 7. nutcracker Pipelining PIPELINE ( HGETALL foo SET hello world INCR lyfts ) RESPONSE ( (k1, v1, k2, v2) OK 12121986 ) md5(foo) % 3 = 2 md5(hello) % 3 = 0 md5(lyfts) % 3 = 2 0: SET hello world 0 1 2 2: PIPELINE ( HGETALL foo INCR lyfts ) return ordered_results 1. hash the keys 2. send concurrent requests to backends 3. concatenate and return results
  • 8. location = json.dumps({'lat': 23.2, 'lng': -122.3, 'ride_id': None}) with nutcracker.pipeline() as pipeline: pipeline.set(user_id, location) # string per user if driver_mode is True: pipeline.hset(region_name, user_id, location) # hash per region Parity data model Fetching inactive drivers when doing HGETALL. Network and serialization overhead. The obvious fix? Expiration.
  • 10. - pipeline.hset(region_name, user_id, location) # hash per region # hash per region per 30 seconds bucket = now_seconds() - (now_seconds() % 30) hash_key = '{}_{}'.format(region_name, bucket) pipeline.hset(hash_key, user_id, location) pipeline.expire(hash_key, 15) Expiring a hash HGETALL current bucket plus next and previous to handle clock drift and boundary condition, merge in process. 12:00:00 12:00:30 12:01:00 12:01:30 ...
  • 11. Region is a poor hash key (hot shards) Bulk expiration blocks Redis for longer than expected Redis used for inter-service communication with new dispatch system Growing pains Let's fix it!
  • 12. Redis is used for inter-service communication - Replicate existing queries and writes in new service - Replace function calls that query or write to Redis with calls to new service - With contract in place between existing services and new service, refactor data model - Migrate relevant business logic Proper service communication
  • 13. fd a Region is a poor hash key Geohashing is an algorithm that provides arbitrary precision with gradual precision degradation Geohashing b c d e f ihg >>> compute_geohash(lat=37.7852, lng=-122.4044, level=9) 9q8yywefd fa fb fc fe ff fifhfg Google open-sourced an alternative geohashing algorithm, S2
  • 14. loc = {'lat': 23.2, 'lng': -122.3, 'ride_id': None} geohash = compute_geohash(loc['lat'], loc['lng'], level=5) with nutcracker.pipeline() as pipeline: # string per user pipeline.set(user_id, json.dumps(loc)) # sorted set per geohash with last timestamp if driver_mode is True: pipeline.zset(geohash, user_id, now_seconds()) pipeline.zremrangebyscore(geohash, -inf, now_seconds() - 30) # expire! Data model with geohashing Sorted set tells you where a driver might be. Use string as source of truth. On query, look in neighboring geohashes based on desired radius.
  • 15. Why not GEO? Stable release in May 2016 with Redis 3.2.0 Point-in-radius, position of key Uses geohashing and a sorted set under-the-hood No expiration or sharding No metadata storage Great for prototyping Much more data model complexity behind the scenes - Additional metadata - Writing to multiple indices to lower cost of high frequency queries - Balancing scattered gets and hot shards with geohash level GEOADD GEODIST GEOHASH GEOPOS GEORADIUS GEORADIUSBYMEMBER
  • 16. Redis on the Lyft platform
  • 17. Creating a new cluster is a self-service process that takes less than one engineer hour. 2015: 1 cluster of 3 instances 2017: 50 clusters with a total of 750 instances Cluster creation
  • 18. Golang and Python are the two officially supported backend languages at Lyft Python features - Fully compatible with redis-py StrictRedis - Stats - Retry - Pipeline management for interleave and targeted retry from lyftredis import NutcrackerClient redis_client = NutcrackerClient('locations') Internal libraries
  • 19. {% macro redis_cluster_stats(redis_cluster_name, alarm_thresholds) -%} Observability
  • 20. Capacity planning Combine APIs and stats for global capacity plan - Difficult to track 50 clusters - Google Sheets API for display - Internal stats for actual usage, EC2 API for capacity - Automatically determine resource constraint (CPU, memory, network) - Currently aim for 4x headroom due to difficulty of cluster resize - At-a-glance view of peak resource consumption, provisioned resources, cost, resource constraint
  • 21. Object serialization Benefits - Lower memory consumption, I/O - Lower network I/O - Lower serialization cost to CPU 708 bytes 69% 1012 bytes (original) 190 bytes 18%
  • 22. Nutcracker issues - Deprecated internally at Twitter and unmaintained - Passive health checks - No hot restart (config changes cause downtime) - Difficult to extend (e.g. to integrate with service discovery) - When EC2 instance of Redis dies, we page an engineer to fix the problem The road ahead
  • 23. Open-source C++11 service mesh and edge proxy As an advanced load balancer, provides: - Service discovery integration - Retry, circuit breaking, rate limiting - Consistent hashing - Active health checks - Stats, stats, stats - Tracing - Outlier detection, fault injection Envoy was designed to be extensible! What is Envoy? envoy by
  • 24. In production at Lyft as of May 2017 Support for INCR, INCRBY, SET, GET (ratelimit service) Service discovery integration (autoscaling!) Active healthcheck using PING (self-healing!) Pipeline splitting, concurrency Basic stats Introducing Envoy Redis
  • 25. Envoy Redis Roadmap Additional command support Error handling Pipeline management features Performance optimization Replication - Failover - Mitigate hot read shard issues with large objects - Zone local query routing - Quorum - Protocol overloading? e.g. SET key value [replication factor] [timeout] More!
  • 26. Q&A - Thanks! - Email technical inquiries to dhochman lyft.com - Participate in Envoy open source community! lyft/envoy - Lyft is hiring. If you want to work on scaling problems in a fast-moving, high-growth company visit https://www.lyft.com/jobs