SlideShare a Scribd company logo
Polyglottany Is Not
a Sin
           Eric Lubow
           @elubow
           elubow@simplereach.com
           #MongoBoston
Overview
•   SimpleReach
•   Definitions and Data Stores
•   Evolution to Polyglottany
•   Tie It Together
•   Final Thoughts
•   Questions

    Polyglottany Is Not A Sin     Eric Lubow   @elubow
Socially Intelligent


Polyglottany Is Not A Sin                          Eric Lubow   @elubow
Size
•   150m events
    recorded per day
    and growing
•   600m Pageviews per
    month and growing




      Polyglottany Is Not A Sin   Eric Lubow   @elubow
Polyglot Persistence
Polyglot Persistence, like polyglot programming, is all about
choosing the right persistence option for the task at hand.
                              http://www.sleberknight.com/blog/sleberkn/entry/polyglot_persistence




  Polyglottany Is Not A Sin                                              Eric Lubow     @elubow
Right Tool For The Job




Polyglottany Is Not A Sin        Eric Lubow   @elubow
Decisions. Decisions.
                                                                                    •   Is the
•   What are my query patterns?             •   Are my display requirements                                                   •   How fault tolerant is the system?
                                                                                        encryption/authentication/authoriz
                                                for realtime data?                      ation support sufficient for my
    Is my data ingestion high volume/high                                                                                         What supporting tools do I need?




                                                                                                      Tech
•                                                                                                                             •
                                                                                        needs?




                Data
    velocity?                               •   Do I need to aggregate data
                                                on the fly?                                                                   •   Is there support for my language?
                                                                                    •   Are there monitoring
•   Am I batch loading data?                                                            architectures already built?
                                            •   Is my data structured or
•   Am I write heavy or read heavy?             unstructured?                       •   Are there best practices guides
                                                                                        already
•   Are data relationships important?       •   Does my data lend itself to a
                                                specific design pattern?            •   Will the data need to be
•   Does my data need to be                                                             distributed?
    immediately available everywhere?
                                                                           Data   Tech

                                                                      Financial   Other
•   Am I cloud based?
                                                                                        Do I have legal requirements (HIPAA/FIPS/Sarbanes Oxley/PII)?




                                                                                                    Other
                                                                                    •




      Financial
•   Am I hardware based?
                                                                                    •   What kind of enterprise support is available?
•   Am I a cloud/iron hybrid?
                                                                                    •   What is the community like?
•   How much am I willing to spend?
                                                                                    •   Does the product roadmap pertain to my roadmap?
•   How much am I willing to spend if something goes wrong?

     Polyglottany Is Not A Sin                                                                                            Eric Lubow           @elubow
No One Size Fits All




Polyglottany Is Not A Sin          Eric Lubow   @elubow
Tools
                            C*



Polyglottany Is Not A Sin        Eric Lubow   @elubow
Free vs. Cost




Polyglottany Is Not A Sin                   Eric Lubow   @elubow
Languages




Polyglottany Is Not A Sin   Eric Lubow   @elubow
Pre-Scale




Polyglottany Is Not A Sin               Eric Lubow   @elubow
SimpleReach Pre-Scale




Polyglottany Is Not A Sin       Eric Lubow   @elubow
Scale




Polyglottany Is Not A Sin           Eric Lubow   @elubow
SimpleReach

                               C*


Polyglottany Is Not A Sin                 Eric Lubow   @elubow
Mongo Conference




Polyglottany Is Not A Sin       Eric Lubow   @elubow
Cassandra                                                       C*
•   Large data volume ingestion at high velocity
•   Really fast writes to many locations (eventual
    consistency)
•   Query by column groups within rows (slicing)
•   Opscenter
•   Data toolkit: more than a data storage layer
•   TTLs for small group aggregation
•   Wrote Helenus, Node.js driver for Cassandra
    Polyglottany Is Not A Sin                        Eric Lubow   @elubow
MongoDB
•   Fast atomic increments (Node.js is native JSON)
•   Sharding
•   Solid ORM for Rails (MongoID)
•   Fast access for pub/sub of durable/persisted documents
•   B-Tree Indexes
•   Document based via JSON
•   TTLs for ephemeral data

    Polyglottany Is Not A Sin                                Eric Lubow   @elubow
Redis
•   Supports hundreds of thousands transactions per second
•   Great caching engine
•   Supports useful variable types like sets, sorted set, lists
•   Everything is guaranteed to Memory Mapped (mmap)
•   Transactional and supports bulk operations
•   Centralized queueing and locking system


    Polyglottany Is Not A Sin                                     Eric Lubow   @elubow
Infobright
•   Works with standard MySQL driver
•   Column Stores for ad-hoc analytics queries in SQL
•   Databases built for business intelligence
•   Heavy compression of data
•   Pre-aggregated data (Knowledge Grid)




    Polyglottany Is Not A Sin                           Eric Lubow   @elubow
Ruby, Node.js, Python
•   Polyglottany doesn’t only apply to data stores
•   Each language has its own benefit to each data storage layer
•   Each language has its own individual benefits
•   JSON, APIs, Performance




    Polyglottany Is Not A Sin                                Eric Lubow   @elubow
Choice




Polyglottany Is Not A Sin            Eric Lubow   @elubow
Cons
•   Redis - Can only utilize a single core. SerDe price.
•   MySQL Column Store - DELETE/UPDATEs are VERY expensive
•   Cassandra - No btree indexes
•   Mongo - Indexes must fit in memory. Forced Replica ping times
•   Python - Whitespace. Community
•   Ruby - Not high performance enough for our standards
•   Javascript (Node.js) - Bad for CPU or IO intensive workloads


    Polyglottany Is Not A Sin                                      Eric Lubow   @elubow
Tying It Together
Even with the right tools, 80% of the work of building a
big data system is acquiring and refining the raw data into
usable data.




  Polyglottany Is Not A Sin                  Eric Lubow   @elubow
Tying It Together




Polyglottany Is Not A Sin   Eric Lubow   @elubow
Tying It Together
•   Service Oriented Architecture (Internal API)
•   Data accuracy checks: visual and programmatic
•   Built framework for testing out storage engines
•   Access to many toolsets (for all languages and
    DBs)


    Polyglottany Is Not A Sin                 Eric Lubow   @elubow
Service Architecture
 Analytics
          C*
 Real-time
          C*
                            Internal API


Polyglottany Is Not A Sin                  Eric Lubow   @elubow
Distributed Architecture
           US-EAST-1a            US-EAST-1b           US-EAST-1e



       CASSANDRA-0001         CASSANDRA-0002       CASSANDRA-0003

       CASSANDRA-0010         CASSANDRA-0011       CASSANDRA-0012

          REDIS-0001A           REDIS-0001B

           MYSQL-0001                                MYSQL-0002

    MONGO-SHARD-0000-A                           MONGO-SHARD-0000-B

    MONGO-SHARD-0001-B      MONGO-SHARD-0001-A

                            MONGO-SHARD-0002-B   MONGO-SHARD-0002-A

            iAPI-0001             iAPI-0002            iAPI-0003

Polyglottany Is Not A Sin                          Eric Lubow      @elubow
Points To Consider
•   Data consistency - Same in all data stores
•   How important is data durability?
•   Managing many servers (Chef, AWS, CSSH)
•   Managing and learning many different applications
    and tuning for them
•   Expertise



    Polyglottany Is Not A Sin                           Eric Lubow   @elubow
Expertise
•   What happens when you need help?
•   How do you become experts?
•   What happens when you need more experts?




    Polyglottany Is Not A Sin                  Eric Lubow   @elubow
Summary
•   Polyglottany is not a sin
•   Know your data read/write
    patterns
•   Know the tools available to you
•   Know your compromises
•   Expertise


    Polyglottany Is Not A Sin         Eric Lubow   @elubow
We’re Hiring




Polyglottany Is Not A Sin                  Eric Lubow   @elubow
Questions are guaranteed in life.
Answers aren’t.
               Eric Lubow
               @elubow
               elubow@simplereach.com
               #MongoBoston

               Thank you.

More Related Content

PDF
Infrastructure for Decision Makers
Eric Lubow
 
PDF
Making It To Veteren Cassandra Status
Eric Lubow
 
PDF
Counters At Scale - A Cautionary Tale
Eric Lubow
 
PDF
Message Architectures in Distributed Systems - Data Day Texas 2013-01-11
Eric Lubow
 
PDF
An Introduction to jOOQ
Steve Pember
 
KEY
Windycityrails page performance
John McCaffrey
 
KEY
improving the performance of Rails web Applications
John McCaffrey
 
PDF
The Ember.js Framework - Everything You Need To Know
All Things Open
 
Infrastructure for Decision Makers
Eric Lubow
 
Making It To Veteren Cassandra Status
Eric Lubow
 
Counters At Scale - A Cautionary Tale
Eric Lubow
 
Message Architectures in Distributed Systems - Data Day Texas 2013-01-11
Eric Lubow
 
An Introduction to jOOQ
Steve Pember
 
Windycityrails page performance
John McCaffrey
 
improving the performance of Rails web Applications
John McCaffrey
 
The Ember.js Framework - Everything You Need To Know
All Things Open
 

What's hot (20)

KEY
Freelancing and side-projects on Rails
John McCaffrey
 
PDF
Reactive All the Way Down the Stack
Steve Pember
 
ODP
Cvcc performance tuning
John McCaffrey
 
PDF
Greach 2018: Surviving Microservices
Steve Pember
 
KEY
Cloud tools
John McCaffrey
 
PPT
Windy cityrails performance_tuning
John McCaffrey
 
PPTX
Programming for the Internet of Things
Kinoma
 
PDF
APIs for the Internet of Things
Kinoma
 
PPTX
My Little Webap - DevOpsSec is Magic
Apollo Clark
 
PDF
An Iterative Approach to Service Oriented Architecture
Eric Saxby
 
PDF
Api fundamentals
AgileDenver
 
PDF
User-percieved performance
Mike North
 
PPTX
Untangling - fall2017 - week 8
Derek Jacoby
 
PDF
Conexão Kinghost - Otimização Prematura
Fabio Akita
 
PDF
Modern websites in 2020 and Joomla
George Wilson
 
PDF
You ain't gonna need write a GenServer - Ulisses Almeida | Elixir Club Ukraine
Elixir Club
 
PDF
Powerful Automation Made Simple
Gaetano Giunta
 
KEY
Dibi Conference 2012
Scott Rutherford
 
PDF
Premature optimisation: The Root of All Evil
Fabio Akita
 
PDF
Michael North "Ember.js 2 - Future-friendly ambitious apps, that scale!"
Fwdays
 
Freelancing and side-projects on Rails
John McCaffrey
 
Reactive All the Way Down the Stack
Steve Pember
 
Cvcc performance tuning
John McCaffrey
 
Greach 2018: Surviving Microservices
Steve Pember
 
Cloud tools
John McCaffrey
 
Windy cityrails performance_tuning
John McCaffrey
 
Programming for the Internet of Things
Kinoma
 
APIs for the Internet of Things
Kinoma
 
My Little Webap - DevOpsSec is Magic
Apollo Clark
 
An Iterative Approach to Service Oriented Architecture
Eric Saxby
 
Api fundamentals
AgileDenver
 
User-percieved performance
Mike North
 
Untangling - fall2017 - week 8
Derek Jacoby
 
Conexão Kinghost - Otimização Prematura
Fabio Akita
 
Modern websites in 2020 and Joomla
George Wilson
 
You ain't gonna need write a GenServer - Ulisses Almeida | Elixir Club Ukraine
Elixir Club
 
Powerful Automation Made Simple
Gaetano Giunta
 
Dibi Conference 2012
Scott Rutherford
 
Premature optimisation: The Root of All Evil
Fabio Akita
 
Michael North "Ember.js 2 - Future-friendly ambitious apps, that scale!"
Fwdays
 
Ad

Similar to Polyglottany Is Not A Sin (20)

PDF
C*ollege Credit: Is My App a Good Fit for Cassandra?
DataStax
 
PDF
Finding the Right Data Solution for Your Application in the Data Storage Hays...
Srinath Perera
 
PPT
Db trends final
Craig Mullins
 
PPTX
Basho and Riak at GOTO Stockholm: "Don't Use My Database."
Basho Technologies
 
PDF
What drives Innovation? Innovations And Technological Solutions for the Distr...
Stefano Fago
 
PDF
Finding the Right Data Solution for your Application in the Data Storage Hays...
DATAVERSITY
 
PPTX
Infrastructure of the social highway Preview
Wappow
 
PDF
Realtime hadoopsigmod2011
iammutex
 
PDF
Big data at CallFire
Vijesh Mehta
 
PDF
Firebird meets NoSQL
Mind The Firebird
 
PDF
Scalable, good, cheap
Marc Cluet
 
PDF
Big Data @ Bodensee Barcamp 2010
c1sc0
 
PDF
Beat the devil: towards a Drupal performance benchmark
Pedro González Serrano
 
PDF
Scaling a Web Site - OSCON Tutorial
duleepa
 
PDF
EMC #1 Open XML Database (OEM)
Mountaha
 
ODP
redis
hazzaz
 
PDF
Choosing a MySQL High Availability solution - Percona Live UK 2011
Henrik Ingo
 
PDF
Big Data Israel Meetup : Couchbase and Big Data
Tugdual Grall
 
PPTX
The causes and consequences of too many bits
Dipesh Lall
 
PDF
Qcon
adityaagarwal
 
C*ollege Credit: Is My App a Good Fit for Cassandra?
DataStax
 
Finding the Right Data Solution for Your Application in the Data Storage Hays...
Srinath Perera
 
Db trends final
Craig Mullins
 
Basho and Riak at GOTO Stockholm: "Don't Use My Database."
Basho Technologies
 
What drives Innovation? Innovations And Technological Solutions for the Distr...
Stefano Fago
 
Finding the Right Data Solution for your Application in the Data Storage Hays...
DATAVERSITY
 
Infrastructure of the social highway Preview
Wappow
 
Realtime hadoopsigmod2011
iammutex
 
Big data at CallFire
Vijesh Mehta
 
Firebird meets NoSQL
Mind The Firebird
 
Scalable, good, cheap
Marc Cluet
 
Big Data @ Bodensee Barcamp 2010
c1sc0
 
Beat the devil: towards a Drupal performance benchmark
Pedro González Serrano
 
Scaling a Web Site - OSCON Tutorial
duleepa
 
EMC #1 Open XML Database (OEM)
Mountaha
 
redis
hazzaz
 
Choosing a MySQL High Availability solution - Percona Live UK 2011
Henrik Ingo
 
Big Data Israel Meetup : Couchbase and Big Data
Tugdual Grall
 
The causes and consequences of too many bits
Dipesh Lall
 
Ad

Polyglottany Is Not A Sin

  • 1. Polyglottany Is Not a Sin Eric Lubow @elubow [email protected] #MongoBoston
  • 2. Overview • SimpleReach • Definitions and Data Stores • Evolution to Polyglottany • Tie It Together • Final Thoughts • Questions Polyglottany Is Not A Sin Eric Lubow @elubow
  • 3. Socially Intelligent Polyglottany Is Not A Sin Eric Lubow @elubow
  • 4. Size • 150m events recorded per day and growing • 600m Pageviews per month and growing Polyglottany Is Not A Sin Eric Lubow @elubow
  • 5. Polyglot Persistence Polyglot Persistence, like polyglot programming, is all about choosing the right persistence option for the task at hand. http://www.sleberknight.com/blog/sleberkn/entry/polyglot_persistence Polyglottany Is Not A Sin Eric Lubow @elubow
  • 6. Right Tool For The Job Polyglottany Is Not A Sin Eric Lubow @elubow
  • 7. Decisions. Decisions. • Is the • What are my query patterns? • Are my display requirements • How fault tolerant is the system? encryption/authentication/authoriz for realtime data? ation support sufficient for my Is my data ingestion high volume/high What supporting tools do I need? Tech • • needs? Data velocity? • Do I need to aggregate data on the fly? • Is there support for my language? • Are there monitoring • Am I batch loading data? architectures already built? • Is my data structured or • Am I write heavy or read heavy? unstructured? • Are there best practices guides already • Are data relationships important? • Does my data lend itself to a specific design pattern? • Will the data need to be • Does my data need to be distributed? immediately available everywhere? Data Tech Financial Other • Am I cloud based? Do I have legal requirements (HIPAA/FIPS/Sarbanes Oxley/PII)? Other • Financial • Am I hardware based? • What kind of enterprise support is available? • Am I a cloud/iron hybrid? • What is the community like? • How much am I willing to spend? • Does the product roadmap pertain to my roadmap? • How much am I willing to spend if something goes wrong? Polyglottany Is Not A Sin Eric Lubow @elubow
  • 8. No One Size Fits All Polyglottany Is Not A Sin Eric Lubow @elubow
  • 9. Tools C* Polyglottany Is Not A Sin Eric Lubow @elubow
  • 10. Free vs. Cost Polyglottany Is Not A Sin Eric Lubow @elubow
  • 11. Languages Polyglottany Is Not A Sin Eric Lubow @elubow
  • 12. Pre-Scale Polyglottany Is Not A Sin Eric Lubow @elubow
  • 13. SimpleReach Pre-Scale Polyglottany Is Not A Sin Eric Lubow @elubow
  • 14. Scale Polyglottany Is Not A Sin Eric Lubow @elubow
  • 15. SimpleReach C* Polyglottany Is Not A Sin Eric Lubow @elubow
  • 16. Mongo Conference Polyglottany Is Not A Sin Eric Lubow @elubow
  • 17. Cassandra C* • Large data volume ingestion at high velocity • Really fast writes to many locations (eventual consistency) • Query by column groups within rows (slicing) • Opscenter • Data toolkit: more than a data storage layer • TTLs for small group aggregation • Wrote Helenus, Node.js driver for Cassandra Polyglottany Is Not A Sin Eric Lubow @elubow
  • 18. MongoDB • Fast atomic increments (Node.js is native JSON) • Sharding • Solid ORM for Rails (MongoID) • Fast access for pub/sub of durable/persisted documents • B-Tree Indexes • Document based via JSON • TTLs for ephemeral data Polyglottany Is Not A Sin Eric Lubow @elubow
  • 19. Redis • Supports hundreds of thousands transactions per second • Great caching engine • Supports useful variable types like sets, sorted set, lists • Everything is guaranteed to Memory Mapped (mmap) • Transactional and supports bulk operations • Centralized queueing and locking system Polyglottany Is Not A Sin Eric Lubow @elubow
  • 20. Infobright • Works with standard MySQL driver • Column Stores for ad-hoc analytics queries in SQL • Databases built for business intelligence • Heavy compression of data • Pre-aggregated data (Knowledge Grid) Polyglottany Is Not A Sin Eric Lubow @elubow
  • 21. Ruby, Node.js, Python • Polyglottany doesn’t only apply to data stores • Each language has its own benefit to each data storage layer • Each language has its own individual benefits • JSON, APIs, Performance Polyglottany Is Not A Sin Eric Lubow @elubow
  • 22. Choice Polyglottany Is Not A Sin Eric Lubow @elubow
  • 23. Cons • Redis - Can only utilize a single core. SerDe price. • MySQL Column Store - DELETE/UPDATEs are VERY expensive • Cassandra - No btree indexes • Mongo - Indexes must fit in memory. Forced Replica ping times • Python - Whitespace. Community • Ruby - Not high performance enough for our standards • Javascript (Node.js) - Bad for CPU or IO intensive workloads Polyglottany Is Not A Sin Eric Lubow @elubow
  • 24. Tying It Together Even with the right tools, 80% of the work of building a big data system is acquiring and refining the raw data into usable data. Polyglottany Is Not A Sin Eric Lubow @elubow
  • 25. Tying It Together Polyglottany Is Not A Sin Eric Lubow @elubow
  • 26. Tying It Together • Service Oriented Architecture (Internal API) • Data accuracy checks: visual and programmatic • Built framework for testing out storage engines • Access to many toolsets (for all languages and DBs) Polyglottany Is Not A Sin Eric Lubow @elubow
  • 27. Service Architecture Analytics C* Real-time C* Internal API Polyglottany Is Not A Sin Eric Lubow @elubow
  • 28. Distributed Architecture US-EAST-1a US-EAST-1b US-EAST-1e CASSANDRA-0001 CASSANDRA-0002 CASSANDRA-0003 CASSANDRA-0010 CASSANDRA-0011 CASSANDRA-0012 REDIS-0001A REDIS-0001B MYSQL-0001 MYSQL-0002 MONGO-SHARD-0000-A MONGO-SHARD-0000-B MONGO-SHARD-0001-B MONGO-SHARD-0001-A MONGO-SHARD-0002-B MONGO-SHARD-0002-A iAPI-0001 iAPI-0002 iAPI-0003 Polyglottany Is Not A Sin Eric Lubow @elubow
  • 29. Points To Consider • Data consistency - Same in all data stores • How important is data durability? • Managing many servers (Chef, AWS, CSSH) • Managing and learning many different applications and tuning for them • Expertise Polyglottany Is Not A Sin Eric Lubow @elubow
  • 30. Expertise • What happens when you need help? • How do you become experts? • What happens when you need more experts? Polyglottany Is Not A Sin Eric Lubow @elubow
  • 31. Summary • Polyglottany is not a sin • Know your data read/write patterns • Know the tools available to you • Know your compromises • Expertise Polyglottany Is Not A Sin Eric Lubow @elubow
  • 32. We’re Hiring Polyglottany Is Not A Sin Eric Lubow @elubow
  • 33. Questions are guaranteed in life. Answers aren’t. Eric Lubow @elubow [email protected] #MongoBoston Thank you.

Editor's Notes

  • #4: SimpleReach is a social intelligence tool for content creators. We track everything social action, on every major network, across the entire web in real-time. That means every like, tweet, pin, stumble and many more.