SlideShare a Scribd company logo
Real-time Location Based Social
  Discovery using MongoDB




         Fredrik Björk
       Director of Engineering
           MongoSV, Dec 4th 2012
What is Banjo?
• The most powerful location based mobile
  technology that brings you the moments
  you would otherwise miss
• Aggregates geo tagged posts from
  Facebook, Twitter, Instagram and
  Foursquare in real-time
3
Stats
•   Launched June 2011
•   3 million users
•   Social graph of 400 million profiles
•   50 billion connections
•   ~200 geo posts created per second




                                          4
Why MongoDB?
• Developer friendly
• Easy to maintain and scale
• Automatic failover
• Rapid prototyping of features
• Good fit for consuming, storing and
  presenting JSON data
• Geospatial features out of the box


                                       5
Infrastructure
• ~160 EC2 instances (75% MongoDB, 25%
  Redis)
• SSD drives for low latency
• App servers (Sinatra & Rails) hosted on
  Heroku
• Mongos with authentication running on
  dedicated servers



                                            6
Geo tagged posts
• Consumed as JSON from social network
  APIs - streaming, polling & real-time
  callbacks
• Exposed via REST APIs as JSON to the
  Banjo iOS and Android apps




                                          7
Schema design




https://twitter.com/fbjork/status/262989592561606656




                                                       8
• _id is composed of provider (Facebook:
  1, Twitter: 2 etc.) and post id for
  uniqueness

          https://twitter.com/fbjork/status/262989592561606656


> db.posts.find({ _id: ‘2:262989592561606656’ })

{
    _id: “2:262989592561606656”,
    username: “fbjork”,
    text: “Will give a presentation at #MongoSV on how we use @MongoDB for
    real-time location based social discovery at @Banjo http://www.10gen.com/
    events/mongosv”,
    ...
}
                                                                                9
• Coordinates are stored inside an array
  with latitude, longitude


{
    _id: “2:262989592561606656”,
    username: “fbjork”,
    text: “Will give a presentation at #MongoSV on how we use @MongoDB for
    real-time location based social discovery at @Banjo http://www.10gen.com/
    events/mongosv”,
    coordinates: [37.784234,-122.438212],
    ...
}




                                                                            10
• Friends are stored inside an array



{
    _id: “2:262989592561606656”,
    username: “fbjork”,
    text: “Will give a presentation at #MongoSV on how we use @MongoDB for
    real-time location based social discovery at @Banjo http://www.10gen.com/
    events/mongosv”,
    coordinates: [37.784234,-122.438212],
    friend_ids: [8816792, 10324882, 2006261, ...]
}




                                                                            11
12
Geospatial Indexing
• Create the geo index:


> db.posts.ensureIndex( { coordinates: ‘2d’ } )




                                                  13
Find nearby posts in Miami:



> db.posts.find( { coordinates: { $near: [25.792627,-80.226142] } } )


{ _id: “2:809438082”, coordinates: [25.792610,-80.226100], username:
“Rebecca_Boorsma”, text: “I love Miami!”, ... }


{ _id: “2:1234567”, coordinates: [25.781324,-80.431423], username:
“foo”, text: “Another day, another dollar”, ... }




                                                                       14
15
Find friend posts globally:



> db.posts.find({ friend_ids: { $in: [2006261] })


{
    _id: “2:10248172”,
    username: “fbjork”,
    friend_ids: [8816792, 10324882, 2006261, ...],
    ...
}




                                                     16
Find friend posts in a location:



> db.posts.find({ coordinates: { $near: [25.792627,-80.226142] },
friend_ids: { $in: [2006261] })


{
    _id: “2:10248172”,
    username: “fbjork”,
    friend_ids: [8816792, 10324882, 2006261, ...],
    ...
}



                                                                   17
Compound geo indexes
• Create a compound index on coordinates
  and friend_ids:

> db.posts.ensureIndex( { coordinates: ‘2d’, friend_ids: 1 } )




                                                                 18
• Fails for compound indexes with large
   arrays
 • Geospatial indexes have a size limit of
   1000 bytes

> db.posts.ensureIndex( { coordinates: ‘2d’, friend_ids: 1 } )


Error: Key too large to index




                                                                 19
Geospatial query performance
• Do we need a compound index at all?
• Geospatial index is usually restrictive
  enough
• Problem: Array traversal (using $in) is
  CPU hungry for large arrays
• Solution: Pre-sharded array fields




                                            20
Pre-sharded array fields
• When dealing with large arrays, i.e
  @BarackObama follower ids
• Partition fields using pre-sharding
• shard = Hash(key) MOD shard_count
• Keep array sizes in the low hundreds




                                         21
# shard_example.rb

SHARDS = 3
friend_ids = [1000 , 1001, 1002, 1003, 1004, 1005, 1006]
friend_ids.each { |f| puts Zlib.crc32(f.to_s) % SHARDS }
0
2
0
2
1
2
0


{
    friends_0: [1000, 1002, 1006],
    friends_1: [1004],
    friends_2: [1001, 1003, 1005]
}

                                                           22
Find friend posts using pre-sharding
of the friend arrays:




> db.posts.find({ coordinates: { $near: [25.792627,-80.226142] },
friend_0: { $in: [1000] })

{
    friends_0: [1000, 1002, 1006],
    friends_1: [1004],
    friends_2: [1001, 1003, 1005]
}




                                                                   23
Capped collections
• Good fit for storing a feed of posts for a
  period of time
• Eliminates need to expire old posts
• Documents can’t grow
• Documents can’t be deleted
• Resizing collections is painful
• Can’t be sharded


                                              24
TTL collections
• We switched to TTL collections with
  MongoDB 2.2
• Deleting and growing documents is now
  possible
• Easier to change expiration times
• Can be sharded (not by geo)




                                          25
Questions




            26
Thank you!


     Available:                   fredrik@teambanjo.com
iPhone and Android                        @fbjork

More Related Content

What's hot (19)

PPTX
Webinar: Back to Basics: Thinking in Documents
MongoDB
 
PPTX
Webinar: Schema Design
MongoDB
 
PPT
MongoDB Schema Design
MongoDB
 
KEY
Schema Design with MongoDB
rogerbodamer
 
PPTX
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...
MongoDB
 
KEY
Schema Design by Example ~ MongoSF 2012
hungarianhc
 
PPTX
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
MongoDB
 
PPTX
Data Modeling for the Real World
Mike Friedman
 
PPTX
MongoDB Schema Design: Four Real-World Examples
Mike Friedman
 
PPT
Building web applications with mongo db presentation
Murat Çakal
 
PPTX
Building a Location-based platform with MongoDB from Zero.
Ravi Teja
 
PDF
Building a Social Network with MongoDB
Fred Chu
 
PPTX
Webinar: General Technical Overview of MongoDB for Dev Teams
MongoDB
 
PPTX
MongoDB Advanced Schema Design - Inboxes
Jared Rosoff
 
PDF
MongoDB dessi-codemotion
Massimiliano Dessì
 
PPTX
Building a Scalable Inbox System with MongoDB and Java
antoinegirbal
 
PDF
Agile Schema Design: An introduction to MongoDB
Stennie Steneker
 
PDF
Building Apps with MongoDB
Nate Abele
 
PDF
Building your first app with mongo db
MongoDB
 
Webinar: Back to Basics: Thinking in Documents
MongoDB
 
Webinar: Schema Design
MongoDB
 
MongoDB Schema Design
MongoDB
 
Schema Design with MongoDB
rogerbodamer
 
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...
MongoDB
 
Schema Design by Example ~ MongoSF 2012
hungarianhc
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
MongoDB
 
Data Modeling for the Real World
Mike Friedman
 
MongoDB Schema Design: Four Real-World Examples
Mike Friedman
 
Building web applications with mongo db presentation
Murat Çakal
 
Building a Location-based platform with MongoDB from Zero.
Ravi Teja
 
Building a Social Network with MongoDB
Fred Chu
 
Webinar: General Technical Overview of MongoDB for Dev Teams
MongoDB
 
MongoDB Advanced Schema Design - Inboxes
Jared Rosoff
 
MongoDB dessi-codemotion
Massimiliano Dessì
 
Building a Scalable Inbox System with MongoDB and Java
antoinegirbal
 
Agile Schema Design: An introduction to MongoDB
Stennie Steneker
 
Building Apps with MongoDB
Nate Abele
 
Building your first app with mongo db
MongoDB
 

Viewers also liked (15)

PPTX
夕会3
Kazuki Yoshida
 
PPT
茨城県、医療・福祉で活性化
Nobuyuki Kawagashira
 
PDF
ヤマトメール便が廃止に 4月以降の代替サービスは?
節約 社長
 
PDF
政府債務の償還と財源の通貨発行権(借換債と交付債)について2015.11.20
Kenji Katsuragi
 
PDF
20140529毎日新聞社メディアカフェ講演「インターネットは政治を変えるか?―立命館大、毎日新聞共同研究が明らかにした可能性」
亮介 西田
 
PDF
Strum To a New Market
Kelly Ihme
 
PDF
被災者の行政手続きにおける地図情報の活用とAPIの利用
Yoichi Kayama
 
PPTX
ARtoolkitを用いた漢字学習
yuuki oonaka
 
PDF
Editing tips
ron mader
 
PPTX
地域経済に対する自治体財政の影響に関する研究
Yutaka ENARI
 
PPTX
研究内容プレゼンテーション(リサーチデザイン)
眞嶌 名奈
 
PDF
Glimpse Inside the 2016 Digital Storytelling Toolkit
Victor Hernandez
 
PPTX
yukai2
Kazuki Yoshida
 
PDF
政府の人工知能研究の取組と産業界への期待
NVIDIA Japan
 
PDF
政府債務の償還と財源の通貨発行権(借換債と交付債)
Kenji Katsuragi
 
夕会3
Kazuki Yoshida
 
茨城県、医療・福祉で活性化
Nobuyuki Kawagashira
 
ヤマトメール便が廃止に 4月以降の代替サービスは?
節約 社長
 
政府債務の償還と財源の通貨発行権(借換債と交付債)について2015.11.20
Kenji Katsuragi
 
20140529毎日新聞社メディアカフェ講演「インターネットは政治を変えるか?―立命館大、毎日新聞共同研究が明らかにした可能性」
亮介 西田
 
Strum To a New Market
Kelly Ihme
 
被災者の行政手続きにおける地図情報の活用とAPIの利用
Yoichi Kayama
 
ARtoolkitを用いた漢字学習
yuuki oonaka
 
Editing tips
ron mader
 
地域経済に対する自治体財政の影響に関する研究
Yutaka ENARI
 
研究内容プレゼンテーション(リサーチデザイン)
眞嶌 名奈
 
Glimpse Inside the 2016 Digital Storytelling Toolkit
Victor Hernandez
 
政府の人工知能研究の取組と産業界への期待
NVIDIA Japan
 
政府債務の償還と財源の通貨発行権(借換債と交付債)
Kenji Katsuragi
 
Ad

Similar to Real-time Location Based Social Discovery using MongoDB (20)

PPTX
First app online conf
MongoDB
 
PPTX
Geoindexing with MongoDB
leafnode
 
KEY
Building your first application w/mongoDB MongoSV2011
Steven Francia
 
PPTX
Webinar: Building Your First Application with MongoDB
MongoDB
 
PPT
MongoDB at the Silicon Valley iPhone and iPad Developers' Meetup
MongoDB
 
PDF
Starting with MongoDB
DoThinger
 
PDF
Learn Learn how to build your mobile back-end with MongoDB
Marakana Inc.
 
KEY
Building a Cross Channel Content Delivery Platform with MongoDB
MongoDB
 
KEY
Mongodb intro
christkv
 
PDF
OSDC 2012 | Building a first application on MongoDB by Ross Lawley
NETWAYS
 
PPTX
Spatial MongoDB, Node.JS, and Express - server-side JS for your application
Steven Pousty
 
PPT
MongoDB, it's not just about big data
willshulman
 
PPTX
Getting Started with Geospatial Data in MongoDB
MongoDB
 
KEY
Building Your First MongoDB Application
Rick Copeland
 
KEY
Geospatial Indexing and Querying with MongoDB
Grant Goodale
 
PPTX
Mongo db intro new
Abhinav Dhasmana
 
KEY
2012 phoenix mug
Paul Pedersen
 
KEY
Managing Social Content with MongoDB
MongoDB
 
PDF
MongoDB.pdf
KuldeepKumar778733
 
PPTX
Internet of things
Bryan Reinero
 
First app online conf
MongoDB
 
Geoindexing with MongoDB
leafnode
 
Building your first application w/mongoDB MongoSV2011
Steven Francia
 
Webinar: Building Your First Application with MongoDB
MongoDB
 
MongoDB at the Silicon Valley iPhone and iPad Developers' Meetup
MongoDB
 
Starting with MongoDB
DoThinger
 
Learn Learn how to build your mobile back-end with MongoDB
Marakana Inc.
 
Building a Cross Channel Content Delivery Platform with MongoDB
MongoDB
 
Mongodb intro
christkv
 
OSDC 2012 | Building a first application on MongoDB by Ross Lawley
NETWAYS
 
Spatial MongoDB, Node.JS, and Express - server-side JS for your application
Steven Pousty
 
MongoDB, it's not just about big data
willshulman
 
Getting Started with Geospatial Data in MongoDB
MongoDB
 
Building Your First MongoDB Application
Rick Copeland
 
Geospatial Indexing and Querying with MongoDB
Grant Goodale
 
Mongo db intro new
Abhinav Dhasmana
 
2012 phoenix mug
Paul Pedersen
 
Managing Social Content with MongoDB
MongoDB
 
MongoDB.pdf
KuldeepKumar778733
 
Internet of things
Bryan Reinero
 
Ad

Recently uploaded (20)

PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 

Real-time Location Based Social Discovery using MongoDB

  • 1. Real-time Location Based Social Discovery using MongoDB Fredrik Björk Director of Engineering MongoSV, Dec 4th 2012
  • 2. What is Banjo? • The most powerful location based mobile technology that brings you the moments you would otherwise miss • Aggregates geo tagged posts from Facebook, Twitter, Instagram and Foursquare in real-time
  • 3. 3
  • 4. Stats • Launched June 2011 • 3 million users • Social graph of 400 million profiles • 50 billion connections • ~200 geo posts created per second 4
  • 5. Why MongoDB? • Developer friendly • Easy to maintain and scale • Automatic failover • Rapid prototyping of features • Good fit for consuming, storing and presenting JSON data • Geospatial features out of the box 5
  • 6. Infrastructure • ~160 EC2 instances (75% MongoDB, 25% Redis) • SSD drives for low latency • App servers (Sinatra & Rails) hosted on Heroku • Mongos with authentication running on dedicated servers 6
  • 7. Geo tagged posts • Consumed as JSON from social network APIs - streaming, polling & real-time callbacks • Exposed via REST APIs as JSON to the Banjo iOS and Android apps 7
  • 9. • _id is composed of provider (Facebook: 1, Twitter: 2 etc.) and post id for uniqueness https://twitter.com/fbjork/status/262989592561606656 > db.posts.find({ _id: ‘2:262989592561606656’ }) { _id: “2:262989592561606656”, username: “fbjork”, text: “Will give a presentation at #MongoSV on how we use @MongoDB for real-time location based social discovery at @Banjo http://www.10gen.com/ events/mongosv”, ... } 9
  • 10. • Coordinates are stored inside an array with latitude, longitude { _id: “2:262989592561606656”, username: “fbjork”, text: “Will give a presentation at #MongoSV on how we use @MongoDB for real-time location based social discovery at @Banjo http://www.10gen.com/ events/mongosv”, coordinates: [37.784234,-122.438212], ... } 10
  • 11. • Friends are stored inside an array { _id: “2:262989592561606656”, username: “fbjork”, text: “Will give a presentation at #MongoSV on how we use @MongoDB for real-time location based social discovery at @Banjo http://www.10gen.com/ events/mongosv”, coordinates: [37.784234,-122.438212], friend_ids: [8816792, 10324882, 2006261, ...] } 11
  • 12. 12
  • 13. Geospatial Indexing • Create the geo index: > db.posts.ensureIndex( { coordinates: ‘2d’ } ) 13
  • 14. Find nearby posts in Miami: > db.posts.find( { coordinates: { $near: [25.792627,-80.226142] } } ) { _id: “2:809438082”, coordinates: [25.792610,-80.226100], username: “Rebecca_Boorsma”, text: “I love Miami!”, ... } { _id: “2:1234567”, coordinates: [25.781324,-80.431423], username: “foo”, text: “Another day, another dollar”, ... } 14
  • 15. 15
  • 16. Find friend posts globally: > db.posts.find({ friend_ids: { $in: [2006261] }) { _id: “2:10248172”, username: “fbjork”, friend_ids: [8816792, 10324882, 2006261, ...], ... } 16
  • 17. Find friend posts in a location: > db.posts.find({ coordinates: { $near: [25.792627,-80.226142] }, friend_ids: { $in: [2006261] }) { _id: “2:10248172”, username: “fbjork”, friend_ids: [8816792, 10324882, 2006261, ...], ... } 17
  • 18. Compound geo indexes • Create a compound index on coordinates and friend_ids: > db.posts.ensureIndex( { coordinates: ‘2d’, friend_ids: 1 } ) 18
  • 19. • Fails for compound indexes with large arrays • Geospatial indexes have a size limit of 1000 bytes > db.posts.ensureIndex( { coordinates: ‘2d’, friend_ids: 1 } ) Error: Key too large to index 19
  • 20. Geospatial query performance • Do we need a compound index at all? • Geospatial index is usually restrictive enough • Problem: Array traversal (using $in) is CPU hungry for large arrays • Solution: Pre-sharded array fields 20
  • 21. Pre-sharded array fields • When dealing with large arrays, i.e @BarackObama follower ids • Partition fields using pre-sharding • shard = Hash(key) MOD shard_count • Keep array sizes in the low hundreds 21
  • 22. # shard_example.rb SHARDS = 3 friend_ids = [1000 , 1001, 1002, 1003, 1004, 1005, 1006] friend_ids.each { |f| puts Zlib.crc32(f.to_s) % SHARDS } 0 2 0 2 1 2 0 { friends_0: [1000, 1002, 1006], friends_1: [1004], friends_2: [1001, 1003, 1005] } 22
  • 23. Find friend posts using pre-sharding of the friend arrays: > db.posts.find({ coordinates: { $near: [25.792627,-80.226142] }, friend_0: { $in: [1000] }) { friends_0: [1000, 1002, 1006], friends_1: [1004], friends_2: [1001, 1003, 1005] } 23
  • 24. Capped collections • Good fit for storing a feed of posts for a period of time • Eliminates need to expire old posts • Documents can’t grow • Documents can’t be deleted • Resizing collections is painful • Can’t be sharded 24
  • 25. TTL collections • We switched to TTL collections with MongoDB 2.2 • Deleting and growing documents is now possible • Easier to change expiration times • Can be sharded (not by geo) 25
  • 26. Questions 26
  • 27. Thank you! Available: [email protected] iPhone and Android @fbjork