SlideShare a Scribd company logo
2
Most read
5
Most read
11
Most read
Living with SQL and
NoSQL at Craigslist
      Jeremy Zawodny
          craigslist
There is no stack
     anymore...
-- Mårten Mickos during Wednesday’s Keynote
Data Storage at craigslist
• MySQL
• Memcached
• Redis
• MongoDB
• Sphinx
• Filesystem
Choosing the Right Tool
• Durability
• Performance
• Query API
• Features
• Complexity
• Support
Request Flow (reads)
Browser                       Load Balancer                       Caching Proxy
         Posting, Search, Browse                                  Perl+epoll      Memcached

                                                                        Proxy Cache


     Web Server                                                   Async Services
Apache      mod_perl     Memcached                                Perl+epoll      Memcached

         Posting Cache


                                                                        haproxy


  MongoDB                                   Sphinx                        MySQL
 Archived Postings                   Live and Archived Postings           Live Postings
Request Flow (reads)
Browser                Load Balancer                   Caching Proxy
      Image Requests                                   Perl+epoll    Memcached

                                                             Proxy Cache




                            Image Storage
                        Apache   mod_perl   xfs+JBOD
Data Repositories
   MongoDB                      MySQL                 Filesystem
OldPostings   Email Meta    Postings      Finance    Images      Logs


                             Users       Misc Meta

                             Abuse      WorkQueue

                             Stats      Monitoring



                                     Redis
 Memcached                  Counters         Lists        Sphinx
 Counters      Postings      Blobs      Monitoring   Postings   Internal

  Blobs        Objects     WorkQueue                 Forums     Archive
MySQL at craigslist
•   Vertical Partitioning: Clusters
    •   auth/users, abuse/spam, postings, finance
•   Sub-partitioning: Roles
    •   master, read, long read, dumper, thrash
•   Lots of SSD storage (mostly fusion-io)
    •   solved most of our performance problems
•   Few manual tasks
    •   re-cloning slaves, master swaps
MySQL at craigslist
• MySQL 5.5.x
 • hoping to move to 5.6.x
    • GTID + crash-safe slaves?!?!
• InnoDB almost everywhere
 • InnoDB compression where it works well
 • Large buffer pool (48GB common)
• haproxy sits between clients and servers
MySQL at Craigslist
      Postings Database Cluster




                                       long read

                                                   long read




                                                                        dumper
                                                               thrash
   write




                                read
           read

                  read

                         read




                           haproxy

                           client(s)
Why MySQL?
•   It’s the devil we know!
    •   Very reliable
    •   Lots of Admin and Dev skills
•   Durability
•   Replication
•   Support
    •   Seriously, look at this ecosystem
•   Data Model
Why memcached?
• Wickedly Fast
• Stable
• Virtually zero administration required
• Easily co-exists with CPU-intensive services
• Muti-core? Run more instances!
Memcached at craigslist
• Primary cache for rendered pages
  (compresed and full), serialized objects, and
  misc. other data
• Used for lots of transient data blobs (and
  occasional counters)
• Custom async client library
 • Some key encoding issues
• Durability via client-side mirroring (think
  RAID-1)
Redis at craigslist
• Primary repository of posting activity
  metadata used in analysis tasks
• Remote replication in 2nd data center
• 80+% of data in sorted sets (ZSETS)
• Sharded multi-node cluster
 • See: http://bit.ly/I4XUCj
Why Redis?
• Features
• Performance
• Flexible Persistence
• Excellent but simple API
• Project Vision
• Muti-core? Run more instances!
MongoDB at craigslist
•   Repository of 2.5+ billion archived postings
    •   growing and growing and growing
•   3 shards across 3 node replica sets
    •   duplicate config in 2nd data center
•   ~6TB of data, sized up to 12TB
•   Biggest challenge was data migration
•   Previous talks:
    •   http://bit.ly/HEYJ57 (before)
    •   http://bit.ly/Hr2qMf (after)
Why MongoDB?
• Schema free
• Active community
• Commercial support
• Perl client!
• Ease of scaling
  • Yay! for built-in sharding support
• Fewer single points of failure
  • Replica sets are awesome
Sphinx at craigslist
• Full-text indexing and search of
 • all live postings
 • all archived postings
 • all forums (in progress)
• 300+ million daily queries
Why Sphinx?
• Performance
• Friendly API
• Flexibility in deployment model
• Commercial support
Filesystem at craigslist
• All uploaded images are stored in XFS
• Multiple image sizes, resized upon upload
Why Filesystem?
• Reliable (and Simple)
 • We use XFS for images and databases
 • Proven technology
• Fast
 • Some other filesystems have had
    performance issues
• Easy to move data around
• No other metadata/indexes to worry about
So Many Data Stores...
• Can be hard for developers if you don’t have
  good APIs or abstractions in place!
  • We built an object layer for our MongoDB
    migration
  • It speaks MySQL, Sphinx, MongoDB,
    Memcached
• Relational vs. Non-Relational?
  • In practice, we often just don’t care
  • NoSQL is a stupid label
Craigslist Tech FAQs
• Self-hosted (no virtualization or “cloud”)
• Mix of hardware (2 main vendors)
 • Blades
 • Larger multi-U multi-disk RAID boxes
• Mostly local storage (SAN for backups)
• Virtually all open source infrastructure
  tools
• Famously small (but growing) tech team
Craigslist is Hiring!
• Developers
 • Back-end
 • Front-end
• Systems Administrators
• Network Engineers
• Email: z@craiglist.org plain text resume!

More Related Content

What's hot (20)

PPT
Neo4J : Introduction to Graph Database
Mindfire Solutions
 
PPT
Introduction to redis
Tanu Siwag
 
PDF
[2018] MySQL 이중화 진화기
NHN FORWARD
 
PPTX
Intro to Neo4j
Neo4j
 
PPTX
My sql failover test using orchestrator
YoungHeon (Roy) Kim
 
PPTX
Basics of MongoDB
HabileLabs
 
PPTX
MongoDB
Anthony Slabinck
 
PDF
MySQL Advanced Administrator 2021 - 네오클로바
NeoClova
 
PDF
MySQL GTID 시작하기
I Goo Lee
 
PDF
How Graph Databases efficiently store, manage and query connected data at s...
jexp
 
KEY
Introduction to memcached
Jurriaan Persyn
 
PDF
MyRocks introduction and production deployment
Yoshinori Matsunobu
 
PPTX
Database change management with Liquibase
Jarosław Szczepankiewicz
 
PDF
Intro to HBase
alexbaranau
 
PDF
MariaDB 10.5 binary install (바이너리 설치)
NeoClova
 
PDF
MongoDB WiredTiger Internals
Norberto Leite
 
PPTX
HBase Low Latency
DataWorks Summit
 
PDF
[215]네이버콘텐츠통계서비스소개 김기영
NAVER D2
 
PPTX
RocksDB compaction
MIJIN AN
 
PDF
More mastering the art of indexing
Yoshinori Matsunobu
 
Neo4J : Introduction to Graph Database
Mindfire Solutions
 
Introduction to redis
Tanu Siwag
 
[2018] MySQL 이중화 진화기
NHN FORWARD
 
Intro to Neo4j
Neo4j
 
My sql failover test using orchestrator
YoungHeon (Roy) Kim
 
Basics of MongoDB
HabileLabs
 
MySQL Advanced Administrator 2021 - 네오클로바
NeoClova
 
MySQL GTID 시작하기
I Goo Lee
 
How Graph Databases efficiently store, manage and query connected data at s...
jexp
 
Introduction to memcached
Jurriaan Persyn
 
MyRocks introduction and production deployment
Yoshinori Matsunobu
 
Database change management with Liquibase
Jarosław Szczepankiewicz
 
Intro to HBase
alexbaranau
 
MariaDB 10.5 binary install (바이너리 설치)
NeoClova
 
MongoDB WiredTiger Internals
Norberto Leite
 
HBase Low Latency
DataWorks Summit
 
[215]네이버콘텐츠통계서비스소개 김기영
NAVER D2
 
RocksDB compaction
MIJIN AN
 
More mastering the art of indexing
Yoshinori Matsunobu
 

Viewers also liked (20)

PDF
Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Jeremy Zawodny
 
PPTX
Why Your MongoDB Needs Redis
Itamar Haber
 
PDF
Webinar - Approaching 1 billion documents with MongoDB
Boxed Ice
 
KEY
Sphinx at Craigslist in 2012
Jeremy Zawodny
 
ODP
MySQL And Search At Craigslist
Jeremy Zawodny
 
PDF
Craigslist by the Numbers
Devin Foley
 
PDF
Fulltext engine for non fulltext searches
Adrian Nuta
 
PDF
PostgreSQL and Redis - talk at pgcon 2013
Andrew Dunstan
 
PDF
Midas - on-the-fly schema migration tool for MongoDB.
Dhaval Dalal
 
PDF
Red Box Commerce Shopping Cart
In-Style Software Inc.
 
PDF
Shopping Cart Optimization for eCommerce Web Sites
Charles Wiedenhoft
 
PDF
Tayra
Dhaval Dalal
 
PPTX
Fusion-io and MySQL at Craigslist
Jeremy Zawodny
 
PPT
SphinxSearch
Przemyslaw Wroblewski
 
PDF
Real time fulltext search with sphinx
Adrian Nuta
 
PDF
Managing Big Data with MySQL
mwasaha mwagambo
 
PDF
Social Media Trends - Content Curation
Chris Mikulin
 
PPTX
Sphinx - High performance full-text search for MySQL
Nguyen Van Vuong
 
PDF
Apache Spark Streaming - www.know bigdata.com
knowbigdata
 
PDF
IBM Power Systems: Designed for Data
IBM Power Systems
 
Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Jeremy Zawodny
 
Why Your MongoDB Needs Redis
Itamar Haber
 
Webinar - Approaching 1 billion documents with MongoDB
Boxed Ice
 
Sphinx at Craigslist in 2012
Jeremy Zawodny
 
MySQL And Search At Craigslist
Jeremy Zawodny
 
Craigslist by the Numbers
Devin Foley
 
Fulltext engine for non fulltext searches
Adrian Nuta
 
PostgreSQL and Redis - talk at pgcon 2013
Andrew Dunstan
 
Midas - on-the-fly schema migration tool for MongoDB.
Dhaval Dalal
 
Red Box Commerce Shopping Cart
In-Style Software Inc.
 
Shopping Cart Optimization for eCommerce Web Sites
Charles Wiedenhoft
 
Fusion-io and MySQL at Craigslist
Jeremy Zawodny
 
SphinxSearch
Przemyslaw Wroblewski
 
Real time fulltext search with sphinx
Adrian Nuta
 
Managing Big Data with MySQL
mwasaha mwagambo
 
Social Media Trends - Content Curation
Chris Mikulin
 
Sphinx - High performance full-text search for MySQL
Nguyen Van Vuong
 
Apache Spark Streaming - www.know bigdata.com
knowbigdata
 
IBM Power Systems: Designed for Data
IBM Power Systems
 
Ad

Similar to Living with SQL and NoSQL at craigslist, a Pragmatic Approach (20)

PDF
High Performance Drupal Sites
Abayomi Ayoola
 
PDF
High-Performance Storage Services with HailDB and Java
sunnygleason
 
PPT
Redis e Memcached - Daniel Naves - Omnilogic
Felipe Guimarães
 
PPTX
Drop acid
Mike Feltman
 
PDF
My Sql And Search At Craigslist
MySQLConference
 
PPTX
MySQL Options in OpenStack
Tesora
 
PDF
OpenStack Days East -- MySQL Options in OpenStack
Matt Lord
 
PPTX
Microsoft Openness Mongo DB
Heriyadi Janwar
 
PPTX
How does Apache Pegasus (incubating) community develop at SensorsData
acelyc1112009
 
PPTX
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
DATAVERSITY
 
PDF
Getting started with Riak in the Cloud
Ines Sombra
 
KEY
High Performance Weibo QCon Beijing 2011
Tim Y
 
PPT
ActiveMQ 5.9.x new features
Christian Posta
 
PDF
NOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
Ivan Zoratti
 
KEY
Why ruby and rails
Reuven Lerner
 
PDF
Fixing twitter
Roger Xia
 
PDF
Fixing_Twitter
liujianrong
 
PDF
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
smallerror
 
PDF
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
xlight
 
PPTX
Navigating NoSQL in cloudy skies
shnkr_rmchndrn
 
High Performance Drupal Sites
Abayomi Ayoola
 
High-Performance Storage Services with HailDB and Java
sunnygleason
 
Redis e Memcached - Daniel Naves - Omnilogic
Felipe Guimarães
 
Drop acid
Mike Feltman
 
My Sql And Search At Craigslist
MySQLConference
 
MySQL Options in OpenStack
Tesora
 
OpenStack Days East -- MySQL Options in OpenStack
Matt Lord
 
Microsoft Openness Mongo DB
Heriyadi Janwar
 
How does Apache Pegasus (incubating) community develop at SensorsData
acelyc1112009
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
DATAVERSITY
 
Getting started with Riak in the Cloud
Ines Sombra
 
High Performance Weibo QCon Beijing 2011
Tim Y
 
ActiveMQ 5.9.x new features
Christian Posta
 
NOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
Ivan Zoratti
 
Why ruby and rails
Reuven Lerner
 
Fixing twitter
Roger Xia
 
Fixing_Twitter
liujianrong
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
xlight
 
Navigating NoSQL in cloudy skies
shnkr_rmchndrn
 
Ad

Recently uploaded (20)

PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
Digital Circuits, important subject in CS
contactparinay1
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 

Living with SQL and NoSQL at craigslist, a Pragmatic Approach

  • 1. Living with SQL and NoSQL at Craigslist Jeremy Zawodny craigslist
  • 2. There is no stack anymore... -- Mårten Mickos during Wednesday’s Keynote
  • 3. Data Storage at craigslist • MySQL • Memcached • Redis • MongoDB • Sphinx • Filesystem
  • 4. Choosing the Right Tool • Durability • Performance • Query API • Features • Complexity • Support
  • 5. Request Flow (reads) Browser Load Balancer Caching Proxy Posting, Search, Browse Perl+epoll Memcached Proxy Cache Web Server Async Services Apache mod_perl Memcached Perl+epoll Memcached Posting Cache haproxy MongoDB Sphinx MySQL Archived Postings Live and Archived Postings Live Postings
  • 6. Request Flow (reads) Browser Load Balancer Caching Proxy Image Requests Perl+epoll Memcached Proxy Cache Image Storage Apache mod_perl xfs+JBOD
  • 7. Data Repositories MongoDB MySQL Filesystem OldPostings Email Meta Postings Finance Images Logs Users Misc Meta Abuse WorkQueue Stats Monitoring Redis Memcached Counters Lists Sphinx Counters Postings Blobs Monitoring Postings Internal Blobs Objects WorkQueue Forums Archive
  • 8. MySQL at craigslist • Vertical Partitioning: Clusters • auth/users, abuse/spam, postings, finance • Sub-partitioning: Roles • master, read, long read, dumper, thrash • Lots of SSD storage (mostly fusion-io) • solved most of our performance problems • Few manual tasks • re-cloning slaves, master swaps
  • 9. MySQL at craigslist • MySQL 5.5.x • hoping to move to 5.6.x • GTID + crash-safe slaves?!?! • InnoDB almost everywhere • InnoDB compression where it works well • Large buffer pool (48GB common) • haproxy sits between clients and servers
  • 10. MySQL at Craigslist Postings Database Cluster long read long read dumper thrash write read read read read haproxy client(s)
  • 11. Why MySQL? • It’s the devil we know! • Very reliable • Lots of Admin and Dev skills • Durability • Replication • Support • Seriously, look at this ecosystem • Data Model
  • 12. Why memcached? • Wickedly Fast • Stable • Virtually zero administration required • Easily co-exists with CPU-intensive services • Muti-core? Run more instances!
  • 13. Memcached at craigslist • Primary cache for rendered pages (compresed and full), serialized objects, and misc. other data • Used for lots of transient data blobs (and occasional counters) • Custom async client library • Some key encoding issues • Durability via client-side mirroring (think RAID-1)
  • 14. Redis at craigslist • Primary repository of posting activity metadata used in analysis tasks • Remote replication in 2nd data center • 80+% of data in sorted sets (ZSETS) • Sharded multi-node cluster • See: http://bit.ly/I4XUCj
  • 15. Why Redis? • Features • Performance • Flexible Persistence • Excellent but simple API • Project Vision • Muti-core? Run more instances!
  • 16. MongoDB at craigslist • Repository of 2.5+ billion archived postings • growing and growing and growing • 3 shards across 3 node replica sets • duplicate config in 2nd data center • ~6TB of data, sized up to 12TB • Biggest challenge was data migration • Previous talks: • http://bit.ly/HEYJ57 (before) • http://bit.ly/Hr2qMf (after)
  • 17. Why MongoDB? • Schema free • Active community • Commercial support • Perl client! • Ease of scaling • Yay! for built-in sharding support • Fewer single points of failure • Replica sets are awesome
  • 18. Sphinx at craigslist • Full-text indexing and search of • all live postings • all archived postings • all forums (in progress) • 300+ million daily queries
  • 19. Why Sphinx? • Performance • Friendly API • Flexibility in deployment model • Commercial support
  • 20. Filesystem at craigslist • All uploaded images are stored in XFS • Multiple image sizes, resized upon upload
  • 21. Why Filesystem? • Reliable (and Simple) • We use XFS for images and databases • Proven technology • Fast • Some other filesystems have had performance issues • Easy to move data around • No other metadata/indexes to worry about
  • 22. So Many Data Stores... • Can be hard for developers if you don’t have good APIs or abstractions in place! • We built an object layer for our MongoDB migration • It speaks MySQL, Sphinx, MongoDB, Memcached • Relational vs. Non-Relational? • In practice, we often just don’t care • NoSQL is a stupid label
  • 23. Craigslist Tech FAQs • Self-hosted (no virtualization or “cloud”) • Mix of hardware (2 main vendors) • Blades • Larger multi-U multi-disk RAID boxes • Mostly local storage (SAN for backups) • Virtually all open source infrastructure tools • Famously small (but growing) tech team
  • 24. Craigslist is Hiring! • Developers • Back-end • Front-end • Systems Administrators • Network Engineers • Email: [email protected] plain text resume!

Editor's Notes