SlideShare a Scribd company logo
www.objectrocket.com
Exploring MongoDB and
Elasticsearch
DeveloperWeek Austin 2017
Kimberly Wilkins
Principal Engineer
Databases
@dba_denizen
/wilkinskimberly
www.objectrocket.com
Current Areas of Interest
• NoSQL – MongoDB, Elasticsearch, etc.
• Streaming, real-time analytics
• AR/VR/MR – Augmented, Virtual and
Mixed Reality technologies
• Machine Learning – Deep Learning
• Cryptocurrencies, Blockchain
• Teaching, helping, raising up others
www.objectrocket.com
MongoDB &
Elasticsearch
Better Together? Yes!
www.objectrocket.com
Overview
• Definitions
• Current versions
• Features
• Architectural basics
• Use cases:
Best, Worst, Together
Squirrel
www.objectrocket.com
Why Do It?
The blue data highway… bulging at the seams.
www.objectrocket.com
So Many Forms… As Many Impacts
New technologies, new industries, new uses…
www.objectrocket.com
Data is Coming From Everywhere
Sensors, IoT
www.objectrocket.com
Data is Coming From Everywhere
“Big data is like teenage sex:
everyone talks about it,
nobody really knows how to
do it, everyone thinks
everyone else is doing it, so
everyone claims they are
doing it…”
-Dan Ariely, Duke University
www.objectrocket.com
Remember
• Hold the data
• Find the data fast
• Stream the data between data stores
• Process the data along the way
• Analyze the data
• Understand where the data comes from
www.objectrocket.com
Why?
• Faster, more flexible development
• Lower $ (hardware, software, deployment)
• Performance (faster writes, faster reads)
• Developers (“Schemaless”, cool toys)
• > dev’s than ^ dba’s, devops, SRE’s…
• Variety of NoSQL technologies
www.objectrocket.com
MongoDB &
Elasticsearch
Better Together? Yes!
www.objectrocket.com
MongoDB
"MongoDB (from humongous) is a free and open-source
cross-platform document-oriented database program.
Classified as a NoSQL database program, MongoDB
uses JSON-like documents with schemas.”
– straight from wikipedia
• #1 NoSQL
• #5 Overall
www.objectrocket.com
Features: MongoDB
Document store
collections vs tables; document or objectId’s
Easy for developers – more devs than DBA’s and Ops
flexible data types
Unstructured & structured data
De-normalized
Duplicate data is OK
Index intersections, partials, aggregation pipelines - $lookup
improvements coming in 3.6 *Nov–single db call; updating arrays
Scales vertically or horizontally - sharding
www.objectrocket.com
MongoDB Architectural Basics
• Faster, more flexible development
• Built-in Replication via Replica sets
• HA/DR throughout stack, components
• Scaling via Sharding
• DR via use of Multiple Data Centers
• Delayed and/or Hidden Slaves
• https://www.objectrocket.com/files/objectrocket-for-
mongodb-white-paper.pdf
www.objectrocket.com
Basic MongoDB Architecture
Primary
Secondary Secondary
Heartbeat
Single Replica Set
www.objectrocket.com
Shard 1
Secondary
Secondary
Primary
Shard 2
Secondary
Secondary
Primary
Shard 3
Secondary
Secondary
Primary
Client Drivers
MongoS Tier
(Router)
MongoD Tier Replica Sets
MongoS MongoS MongoS
Config Servers
(Metadata)
Config 3
Config 1
Config 2
Replica Set 3.2
Sharded Cluster
MongoS
www.objectrocket.com
MongoDB Architecture - Advanced
• Multiple Storage Engine Options
• HA/DR throughout stack, components
• Scaling via Sharding
• DR via use of Multiple Data Centers,
delayed/hidden
• Percona Server Edition - has features from
MongoDB Enterprise edition* Security
www.objectrocket.com
Best Use Cases
• User Data - games, chat, social media
• Mobile Analytics, Engagement/Campaigns
• Aggregation Summaries
• Product Catalogs
• Inventory Management
• Shopping Carts
• Content Management Systems - Sitecore
1000 x
www.objectrocket.com
Elasticsearch
www.objectrocket.com
Elasticsearch
“Elasticsearch is a distributed, JSON-
based search and analytics engine
designed for horizontal scalability,
maximum reliability, and easy
management.”
– straight from Elastic.co website
www.objectrocket.com
Best Use Cases
● Cluster - A collection of Elasticsearch nodes of
various roles
↳ Nodes - Elasticsearch processes that perform one or more roles
● Roles are: master, data, ingest, coordinating-only (client)
● Nodes can operate in any combination or all roles
↳ Indexes - A collection of data (like databases/collections)
● Can be combined in queries with wildcards and aliases
● Fields in an index have an unchangeable data type (mapping)
↳ Shards - Slices of the index data
● Unlike many databases, automatically constructed (not key based)
● A replica is just a readonly copy of a shard
↳ Segments - Lucene’s chunk of data
● Automatically built as data is indexed.
● Docs are not deleted, just marked as deleted (can be
optimized/merged)
↳ Documents - A JSON entry in the index
www.objectrocket.com
Elasticsearch vs. Elastic Stack
• Don’t be confused!
• Elasticsearch vs. Elastic Stack
• The Open Source Elastic Stack is a suite of
tools/apps associated with and working in
conjunction with Elasticsearch to complete a variety
of analytics tasks.
www.objectrocket.com
Elastic Stack Ecosystem
www.objectrocket.com
Basic Elastic Architecture
3 Nodes 1 Replica, 1 master-Master –fewer nodes, more resources
per node, each shard performs better
3 Nodes 2 Replicas, 1 master-Master – more nodes, needs more
HW resources but increases search performance for the index and
improves redundancy
www.objectrocket.com
Best Use Cases
• Full and Fuzzy Text Searches **true strength speed
• Geo and Range related searches
• Visualizing Data – with other ES Stack
Components- Kibana
• Logging and Log Analysis xsplunkx
• Scraping and Combining Public Data Sources
• Event and Data Metrics
www.objectrocket.com
Geo Queries – Social Media – Near Me
www.objectrocket.com
Visualization with Kibana
www.objectrocket.com
Visualization with Kibana
MongoDB Elastic (Elasticsearch)
General Purpose Document store DB, server side scripts,
some aggreg pipelines
OLTP = good, REPORTING = not as good
Simple = good, Complex = good, Very Complex = not as good
Full-text search engine, Fuzzy text search, geo near,
keyword, real-time analytics, indexer, distributed , java
based w/Lucene under the covers
Current version: 3.4.10 *Halloween!
Recommended: 3.4.8 or 3.4.9
Current version: 5.6.1 September 18, 2017 *New, kinks from
5.5.3 release from September 11, 2017
Recommended and Available 5.5.1 July 25, 2017
Schemaless **#! Structured, unstructured, semi-structured Schemaless **#! Structured, unstructured, semi-structured
JSON, BSON docs JSON
Sharding to scale Sharding/Nodes to scale
HA via replica sets
(1 Primary, 2 Secondaries – or more with quorum)
HA via replica sets
(1 MASTER, x REPLICAS)
Limited index intersection v2.6+, very large indexes still ehh 1 Query can use multiple indexes
Great general purpose NoSQL db, for Processing, filtering
during query & data retrieval
Processing via index builds, stores in multiple versions.
Great at Indexing; Great at searching big datasets
www.objectrocket.com
Now Combine Them
Like tacos
and tequila
www.objectrocket.com
Combining – in general
• Database >>many indexes or very large indexes
• Data has lots of arrays - to perform queries that
required many different $and clauses on an field
with an array as a value
• SPEED up fuzzy and/or full text searches – ‘chicken’
ex. db.articles.find({ $text: { $search: "chi" } }
www.objectrocket.com
MongoDB & Elasticsearch +
Primarily Search Engine
Scalable, distributed
Horizontal scaling
JSON
Schemaless*
Based on Lucene
Support for Python, JS, .Net,
Scala, Perl, php, Ruby
3rd Party Product Integration
Primarily for Streaming, for
moving data between data
stores, used with other
components and data techs
to create near real time and
very near real time event
analytics, append only,
Horizontal scaling
JSON
Schemaless*
Parallel Processing
3rd Party Product Integration
Primarily OLTP
Scalable, distributed
Verticle or Horizontal
scaling
Binary JSON
Schemaless*
Rapid prototyping
Event Logging
Social Media
Content management
User Data and Actions
NOT in-depth analysis
MongoDB
Elasticsearch
Kafka, others
www.objectrocket.com
MongoDB & Elasticsearch @ObjectRocket
MongoDB
metrics
Centralized
Logging
MongoDB data
visualization Network
monitoring
Website search
Business
Metrics
Elasticsearch metrics
Currently
www.objectrocket.com
Potential New Use 1 – Bitcoin Time Interval Tracking
Bitcoin ticker data Interval Tracking and Analysis….
MongoDB
• Simple and Complex
Queries
• Aggregations at any
stage
Elasticsearch
• Speed up queries –
faster results
• Store frequent queries
for re-use via indexes
www.objectrocket.com
Potential New Use 1 cont’d – Bitcoin Time Interval Tracking
www.objectrocket.com
Potential New Use 2 – Cryptocurrency Platform/Trading
• Crytpocurrency Trading Platform - ex. tribeca
• node.js – v7.8 or higher
• MongoDB database – for persistence, aggregations
• Elasticsearch – the ‘need for speed’ rapid-fire
executions required – sub millisecond trades & cancellations
www.objectrocket.com
Potential New Use 3 – Social Media App Searching
• Searching large Social Media Apps for frequently
searched items – popular quarterbacks & receivers
on fantasy football sites, wines in comments
• MongoDB’s $text operator is special - cannot be
used more than once in a query; no use with $nor,
etc.
ex. db.comments.find({ $and: [{$text: { $search: ”win"
},{$text: {$search: “red” }}]}) – WON’T WORK!
In MongoDB but combine it.
www.objectrocket.com
Potential New Use 4 – Machine Learning, Deep Learning
www.objectrocket.com
Potential New Use 4 – Machine Learning, Deep Learning
Architecture and Streaming
Platform – Jay Kreps
• Apps/DB’s->data in
• Aggregations at any stage
• Further Queries
• Faster Queries via ES
• Results back into DB’s
• Algorithms applied
• Endless … Limitless …
Device events, time series,
event logs, AR/VR/MR
www.objectrocket.com
Links
• MongoDB to Analyze cryptocurrency price swings and intervals:
https://medium.com/@serbanmihai/aggregate-mongodb-data-with-node-js-and-mongoose-
cryptocurrency-financial-time-series-ae739b4c9485
• MongoDB with node.js – Cryptocurrency trading platform:
https://github.com/michaelgrosner/tribeca
• Arctic MongoDN and Python – Cryptocurrency Database:
https://mxbu.github.io/logbook/2017/06/04/use-arctic-to-create-cryptocurrency-database/
• AI MI DL - Jay Kreps article Architecture and Streaming Platform for AI Deep Learning
Database Pipeline Models Events etc.:
• https://www.oreilly.com/ideas/apache-kafka-and-the-four-challenges-of-production-machine-
learning-systems
www.objectrocket.com
We are Hiring!
Join a dynamic and
innovative team!
objectrocket.com/careers
www.objectrocket.com
Consultations Available
sales@objectrocket.com
objectrocket.com/customers/
View Customer Stories
Trial & Migrations
always free
objectrocket.com
www.objectrocket.com
Thank You!
DeveloperWeek Austin 2017
Kimberly Wilkins
Principal Engineer
Databases
@dba_denizen
/wilkinskimberly

More Related Content

What's hot (20)

PPTX
Introduction to YARN and MapReduce 2
Cloudera, Inc.
 
DOCX
Datastructures and algorithms prepared by M.V.Brehmanada Reddy
Malikireddy Bramhananda Reddy
 
PPTX
MS SQL Server Full-Text Search
Bassam Diab
 
PPTX
linked list using c
Venkat Reddy
 
PPTX
Basics of MongoDB
HabileLabs
 
PPTX
Linear Data Structures - List, Stack and Queue
Selvaraj Seerangan
 
PPTX
Crawling and Indexing
Himani Tyagi
 
PPT
5 Data Modeling for NoSQL 1/2
Fabio Fumarola
 
PDF
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
Sease
 
PPTX
Slide 2 data models
Visakh V
 
PPTX
04 spark-pair rdd-rdd-persistence
Venkat Datla
 
PPTX
Web Scraping With Python
Robert Dempsey
 
PDF
Data visualization in Python
Marc Garcia
 
PPTX
Relational databases vs Non-relational databases
James Serra
 
PDF
Data models
RituBhargava7
 
ODP
Elasticsearch for beginners
Neil Baker
 
PPTX
Restaurant and food ontologies
Anna Fensel
 
PDF
Taxonomy 101 KMWorld 2021
Enterprise Knowledge
 
DOC
Data structures question paper anna university
sangeethajames07
 
PPTX
WEB Scraping.pptx
Shubham Jaybhaye
 
Introduction to YARN and MapReduce 2
Cloudera, Inc.
 
Datastructures and algorithms prepared by M.V.Brehmanada Reddy
Malikireddy Bramhananda Reddy
 
MS SQL Server Full-Text Search
Bassam Diab
 
linked list using c
Venkat Reddy
 
Basics of MongoDB
HabileLabs
 
Linear Data Structures - List, Stack and Queue
Selvaraj Seerangan
 
Crawling and Indexing
Himani Tyagi
 
5 Data Modeling for NoSQL 1/2
Fabio Fumarola
 
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
Sease
 
Slide 2 data models
Visakh V
 
04 spark-pair rdd-rdd-persistence
Venkat Datla
 
Web Scraping With Python
Robert Dempsey
 
Data visualization in Python
Marc Garcia
 
Relational databases vs Non-relational databases
James Serra
 
Data models
RituBhargava7
 
Elasticsearch for beginners
Neil Baker
 
Restaurant and food ontologies
Anna Fensel
 
Taxonomy 101 KMWorld 2021
Enterprise Knowledge
 
Data structures question paper anna university
sangeethajames07
 
WEB Scraping.pptx
Shubham Jaybhaye
 

Viewers also liked (20)

PDF
Sharding using MySQL and PHP
Mats Kindahl
 
PDF
Building Scalable High Availability Systems using MySQL Fabric
Mats Kindahl
 
PDF
MySQL Enterprise Cloud
Mark Swarbrick
 
PPTX
MEAN Stack
José Moreno
 
PDF
[스마트스터디]MongoDB 의 역습
smartstudy_official
 
PDF
SunshinePHP 2017 - Making the most out of MySQL
Gabriela Ferrara
 
PDF
Ora mysql bothGetting the best of both worlds with Oracle 11g and MySQL Enter...
Ivan Zoratti
 
PDF
20171104 hk-py con-mysql-documentstore_v1
Ivan Ma
 
PDF
MySQL 5.7 - 
Tirando o Máximo Proveito
Gabriela Ferrara
 
PDF
LAMP: Desenvolvendo além do trivial
Gabriela Ferrara
 
PDF
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Ontico
 
PDF
Strip your TEXT fields
Gabriela Ferrara
 
PDF
MySQL Sharding: Tools and Best Practices for Horizontal Scaling
Mats Kindahl
 
PDF
Coding like a girl - DjangoCon
Gabriela Ferrara
 
PDF
Strip your TEXT fields - Exeter Web Feb/2016
Gabriela Ferrara
 
PDF
Mongodb
Apurva Vyas
 
PDF
The MySQL Server Ecosystem in 2016
Colin Charles
 
PDF
MySQL Cluster Whats New
Mark Swarbrick
 
PDF
LaravelSP - MySQL 5.7: introdução ao JSON Data Type
Gabriela Ferrara
 
PPTX
Laravel 5 and SOLID
Igor Talevski
 
Sharding using MySQL and PHP
Mats Kindahl
 
Building Scalable High Availability Systems using MySQL Fabric
Mats Kindahl
 
MySQL Enterprise Cloud
Mark Swarbrick
 
MEAN Stack
José Moreno
 
[스마트스터디]MongoDB 의 역습
smartstudy_official
 
SunshinePHP 2017 - Making the most out of MySQL
Gabriela Ferrara
 
Ora mysql bothGetting the best of both worlds with Oracle 11g and MySQL Enter...
Ivan Zoratti
 
20171104 hk-py con-mysql-documentstore_v1
Ivan Ma
 
MySQL 5.7 - 
Tirando o Máximo Proveito
Gabriela Ferrara
 
LAMP: Desenvolvendo além do trivial
Gabriela Ferrara
 
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Ontico
 
Strip your TEXT fields
Gabriela Ferrara
 
MySQL Sharding: Tools and Best Practices for Horizontal Scaling
Mats Kindahl
 
Coding like a girl - DjangoCon
Gabriela Ferrara
 
Strip your TEXT fields - Exeter Web Feb/2016
Gabriela Ferrara
 
Mongodb
Apurva Vyas
 
The MySQL Server Ecosystem in 2016
Colin Charles
 
MySQL Cluster Whats New
Mark Swarbrick
 
LaravelSP - MySQL 5.7: introdução ao JSON Data Type
Gabriela Ferrara
 
Laravel 5 and SOLID
Igor Talevski
 
Ad

Similar to Exploring MongoDB & Elasticsearch: Better Together (20)

PDF
No sq lv1_0
Tuan Luong
 
PPTX
ElasticSearch as (only) datastore
Tomas Sirny
 
PDF
MongoDB meetup at Hike
Bharvi Dixit
 
KEY
NoSQL in the context of Social Web
Bogdan Gaza
 
PDF
Elasticsearch Introduction at BigData meetup
Eric Rodriguez (Hiring in Lex)
 
PPTX
Elasticsearch 5.0
Matias Cascallares
 
PPTX
Drop acid
Mike Feltman
 
PDF
MongoDB
Serdar Buyuktemiz
 
PPTX
mongodb_DS.pptx
DavoudSalehi1
 
PPTX
NoSql Data Management
sameerfaizan
 
PPTX
Devnexus 2018
Roy Russo
 
PPTX
Agility and Scalability with MongoDB
MongoDB
 
PPTX
Elasticsearch vs MongoDB comparison
jeetendra mandal
 
PDF
DevCon Summit 2014 #DevelopersUnitePH: The "What" and "Why" of NoSQL by Matia...
DEVCON
 
PDF
The What and Why of NoSql
Matias Cascallares
 
PDF
Scaling MongoDB - Presentation at MTP
darkdata
 
PPTX
Meetup#2: Building responsive Symbology & Suggest WebService
Minsk MongoDB User Group
 
PDF
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
Daniel N
 
PDF
NOsql Presentation.pdf
AkshayDwivedi31
 
No sq lv1_0
Tuan Luong
 
ElasticSearch as (only) datastore
Tomas Sirny
 
MongoDB meetup at Hike
Bharvi Dixit
 
NoSQL in the context of Social Web
Bogdan Gaza
 
Elasticsearch Introduction at BigData meetup
Eric Rodriguez (Hiring in Lex)
 
Elasticsearch 5.0
Matias Cascallares
 
Drop acid
Mike Feltman
 
mongodb_DS.pptx
DavoudSalehi1
 
NoSql Data Management
sameerfaizan
 
Devnexus 2018
Roy Russo
 
Agility and Scalability with MongoDB
MongoDB
 
Elasticsearch vs MongoDB comparison
jeetendra mandal
 
DevCon Summit 2014 #DevelopersUnitePH: The "What" and "Why" of NoSQL by Matia...
DEVCON
 
The What and Why of NoSql
Matias Cascallares
 
Scaling MongoDB - Presentation at MTP
darkdata
 
Meetup#2: Building responsive Symbology & Suggest WebService
Minsk MongoDB User Group
 
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
Daniel N
 
NOsql Presentation.pdf
AkshayDwivedi31
 
Ad

Recently uploaded (20)

PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 

Exploring MongoDB & Elasticsearch: Better Together

  • 1. www.objectrocket.com Exploring MongoDB and Elasticsearch DeveloperWeek Austin 2017 Kimberly Wilkins Principal Engineer Databases @dba_denizen /wilkinskimberly
  • 2. www.objectrocket.com Current Areas of Interest • NoSQL – MongoDB, Elasticsearch, etc. • Streaming, real-time analytics • AR/VR/MR – Augmented, Virtual and Mixed Reality technologies • Machine Learning – Deep Learning • Cryptocurrencies, Blockchain • Teaching, helping, raising up others
  • 4. www.objectrocket.com Overview • Definitions • Current versions • Features • Architectural basics • Use cases: Best, Worst, Together Squirrel
  • 5. www.objectrocket.com Why Do It? The blue data highway… bulging at the seams.
  • 6. www.objectrocket.com So Many Forms… As Many Impacts New technologies, new industries, new uses…
  • 7. www.objectrocket.com Data is Coming From Everywhere Sensors, IoT
  • 8. www.objectrocket.com Data is Coming From Everywhere “Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it…” -Dan Ariely, Duke University
  • 9. www.objectrocket.com Remember • Hold the data • Find the data fast • Stream the data between data stores • Process the data along the way • Analyze the data • Understand where the data comes from
  • 10. www.objectrocket.com Why? • Faster, more flexible development • Lower $ (hardware, software, deployment) • Performance (faster writes, faster reads) • Developers (“Schemaless”, cool toys) • > dev’s than ^ dba’s, devops, SRE’s… • Variety of NoSQL technologies
  • 12. www.objectrocket.com MongoDB "MongoDB (from humongous) is a free and open-source cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with schemas.” – straight from wikipedia • #1 NoSQL • #5 Overall
  • 13. www.objectrocket.com Features: MongoDB Document store collections vs tables; document or objectId’s Easy for developers – more devs than DBA’s and Ops flexible data types Unstructured & structured data De-normalized Duplicate data is OK Index intersections, partials, aggregation pipelines - $lookup improvements coming in 3.6 *Nov–single db call; updating arrays Scales vertically or horizontally - sharding
  • 14. www.objectrocket.com MongoDB Architectural Basics • Faster, more flexible development • Built-in Replication via Replica sets • HA/DR throughout stack, components • Scaling via Sharding • DR via use of Multiple Data Centers • Delayed and/or Hidden Slaves • https://www.objectrocket.com/files/objectrocket-for- mongodb-white-paper.pdf
  • 16. www.objectrocket.com Shard 1 Secondary Secondary Primary Shard 2 Secondary Secondary Primary Shard 3 Secondary Secondary Primary Client Drivers MongoS Tier (Router) MongoD Tier Replica Sets MongoS MongoS MongoS Config Servers (Metadata) Config 3 Config 1 Config 2 Replica Set 3.2 Sharded Cluster MongoS
  • 17. www.objectrocket.com MongoDB Architecture - Advanced • Multiple Storage Engine Options • HA/DR throughout stack, components • Scaling via Sharding • DR via use of Multiple Data Centers, delayed/hidden • Percona Server Edition - has features from MongoDB Enterprise edition* Security
  • 18. www.objectrocket.com Best Use Cases • User Data - games, chat, social media • Mobile Analytics, Engagement/Campaigns • Aggregation Summaries • Product Catalogs • Inventory Management • Shopping Carts • Content Management Systems - Sitecore 1000 x
  • 20. www.objectrocket.com Elasticsearch “Elasticsearch is a distributed, JSON- based search and analytics engine designed for horizontal scalability, maximum reliability, and easy management.” – straight from Elastic.co website
  • 21. www.objectrocket.com Best Use Cases ● Cluster - A collection of Elasticsearch nodes of various roles ↳ Nodes - Elasticsearch processes that perform one or more roles ● Roles are: master, data, ingest, coordinating-only (client) ● Nodes can operate in any combination or all roles ↳ Indexes - A collection of data (like databases/collections) ● Can be combined in queries with wildcards and aliases ● Fields in an index have an unchangeable data type (mapping) ↳ Shards - Slices of the index data ● Unlike many databases, automatically constructed (not key based) ● A replica is just a readonly copy of a shard ↳ Segments - Lucene’s chunk of data ● Automatically built as data is indexed. ● Docs are not deleted, just marked as deleted (can be optimized/merged) ↳ Documents - A JSON entry in the index
  • 22. www.objectrocket.com Elasticsearch vs. Elastic Stack • Don’t be confused! • Elasticsearch vs. Elastic Stack • The Open Source Elastic Stack is a suite of tools/apps associated with and working in conjunction with Elasticsearch to complete a variety of analytics tasks.
  • 24. www.objectrocket.com Basic Elastic Architecture 3 Nodes 1 Replica, 1 master-Master –fewer nodes, more resources per node, each shard performs better 3 Nodes 2 Replicas, 1 master-Master – more nodes, needs more HW resources but increases search performance for the index and improves redundancy
  • 25. www.objectrocket.com Best Use Cases • Full and Fuzzy Text Searches **true strength speed • Geo and Range related searches • Visualizing Data – with other ES Stack Components- Kibana • Logging and Log Analysis xsplunkx • Scraping and Combining Public Data Sources • Event and Data Metrics
  • 26. www.objectrocket.com Geo Queries – Social Media – Near Me
  • 28. www.objectrocket.com Visualization with Kibana MongoDB Elastic (Elasticsearch) General Purpose Document store DB, server side scripts, some aggreg pipelines OLTP = good, REPORTING = not as good Simple = good, Complex = good, Very Complex = not as good Full-text search engine, Fuzzy text search, geo near, keyword, real-time analytics, indexer, distributed , java based w/Lucene under the covers Current version: 3.4.10 *Halloween! Recommended: 3.4.8 or 3.4.9 Current version: 5.6.1 September 18, 2017 *New, kinks from 5.5.3 release from September 11, 2017 Recommended and Available 5.5.1 July 25, 2017 Schemaless **#! Structured, unstructured, semi-structured Schemaless **#! Structured, unstructured, semi-structured JSON, BSON docs JSON Sharding to scale Sharding/Nodes to scale HA via replica sets (1 Primary, 2 Secondaries – or more with quorum) HA via replica sets (1 MASTER, x REPLICAS) Limited index intersection v2.6+, very large indexes still ehh 1 Query can use multiple indexes Great general purpose NoSQL db, for Processing, filtering during query & data retrieval Processing via index builds, stores in multiple versions. Great at Indexing; Great at searching big datasets
  • 30. www.objectrocket.com Combining – in general • Database >>many indexes or very large indexes • Data has lots of arrays - to perform queries that required many different $and clauses on an field with an array as a value • SPEED up fuzzy and/or full text searches – ‘chicken’ ex. db.articles.find({ $text: { $search: "chi" } }
  • 31. www.objectrocket.com MongoDB & Elasticsearch + Primarily Search Engine Scalable, distributed Horizontal scaling JSON Schemaless* Based on Lucene Support for Python, JS, .Net, Scala, Perl, php, Ruby 3rd Party Product Integration Primarily for Streaming, for moving data between data stores, used with other components and data techs to create near real time and very near real time event analytics, append only, Horizontal scaling JSON Schemaless* Parallel Processing 3rd Party Product Integration Primarily OLTP Scalable, distributed Verticle or Horizontal scaling Binary JSON Schemaless* Rapid prototyping Event Logging Social Media Content management User Data and Actions NOT in-depth analysis MongoDB Elasticsearch Kafka, others
  • 32. www.objectrocket.com MongoDB & Elasticsearch @ObjectRocket MongoDB metrics Centralized Logging MongoDB data visualization Network monitoring Website search Business Metrics Elasticsearch metrics Currently
  • 33. www.objectrocket.com Potential New Use 1 – Bitcoin Time Interval Tracking Bitcoin ticker data Interval Tracking and Analysis…. MongoDB • Simple and Complex Queries • Aggregations at any stage Elasticsearch • Speed up queries – faster results • Store frequent queries for re-use via indexes
  • 34. www.objectrocket.com Potential New Use 1 cont’d – Bitcoin Time Interval Tracking
  • 35. www.objectrocket.com Potential New Use 2 – Cryptocurrency Platform/Trading • Crytpocurrency Trading Platform - ex. tribeca • node.js – v7.8 or higher • MongoDB database – for persistence, aggregations • Elasticsearch – the ‘need for speed’ rapid-fire executions required – sub millisecond trades & cancellations
  • 36. www.objectrocket.com Potential New Use 3 – Social Media App Searching • Searching large Social Media Apps for frequently searched items – popular quarterbacks & receivers on fantasy football sites, wines in comments • MongoDB’s $text operator is special - cannot be used more than once in a query; no use with $nor, etc. ex. db.comments.find({ $and: [{$text: { $search: ”win" },{$text: {$search: “red” }}]}) – WON’T WORK! In MongoDB but combine it.
  • 37. www.objectrocket.com Potential New Use 4 – Machine Learning, Deep Learning
  • 38. www.objectrocket.com Potential New Use 4 – Machine Learning, Deep Learning Architecture and Streaming Platform – Jay Kreps • Apps/DB’s->data in • Aggregations at any stage • Further Queries • Faster Queries via ES • Results back into DB’s • Algorithms applied • Endless … Limitless … Device events, time series, event logs, AR/VR/MR
  • 39. www.objectrocket.com Links • MongoDB to Analyze cryptocurrency price swings and intervals: https://medium.com/@serbanmihai/aggregate-mongodb-data-with-node-js-and-mongoose- cryptocurrency-financial-time-series-ae739b4c9485 • MongoDB with node.js – Cryptocurrency trading platform: https://github.com/michaelgrosner/tribeca • Arctic MongoDN and Python – Cryptocurrency Database: https://mxbu.github.io/logbook/2017/06/04/use-arctic-to-create-cryptocurrency-database/ • AI MI DL - Jay Kreps article Architecture and Streaming Platform for AI Deep Learning Database Pipeline Models Events etc.: • https://www.oreilly.com/ideas/apache-kafka-and-the-four-challenges-of-production-machine- learning-systems
  • 40. www.objectrocket.com We are Hiring! Join a dynamic and innovative team! objectrocket.com/careers
  • 42. www.objectrocket.com Thank You! DeveloperWeek Austin 2017 Kimberly Wilkins Principal Engineer Databases @dba_denizen /wilkinskimberly

Editor's Notes

  • #4: MongoDB is somewhat the defacto general purpose NoSQL DB and it has added enough new features and made enough improvements to stay there at top of NoSQL offerings Elastic is moving up and it can do things fast As our word expands and changes, the potential use cases for combining data stores – MongoDB and Elasticsearch – also grows. But before we can talk about those current and potential use cases for combining them, we should take a quick look at what each of them are and when to use them individually.
  • #5: 2 mins
  • #6: People wanted Big Data to go away, they wanted to call it other things or NOT call it things or whatever… EOT IOT IIOT But it’s not going to…
  • #8: -Internet of Things / Everything / Industrial IIOT - logs, events, - 2019 ~$1.7 TRILLION $$ -Monitoring and managing those has sprung up whole companies now – -Augmented Reality AR VR MR - THE FUTURE – the next iphone level CHANGE Manufacturing, Training,
  • #9: Sorry, not sorry - still love this quote after all of the years - But the truth remains – more and more and more Data Points Requires THINGS (applications, Data Store) to manage them
  • #10: We NEED Something to hold the data, to find the data fast, to SHARE the data and MOVE it from one APP to another Process and transform along the way, Analyze it MEANINGS
  • #11: NEVER truly schemaless though… If you are NOT thinking about app design before you actually start designing it, you FAIL You are just storing data that will likely never be used and your new shiny NoSQL datastore will just become a data wasteland = MongoDB and Elastic then MONGODB solo next
  • #12: Keeo them tied together here – MongoDB is somewhat the defacto general purpose NoSQL DB and it has added enough new features and made enough improvements to stay there at top of NoSQL offerings Elastic is moving up and it can do things fast
  • #13: IF something comes straight from wikipedia it HAS to be true MongoDB is the defacto general purpose NoSQL DB #5 Datastore technology over and holding steady there #1 NoSQL Database product
  • #14: MongoDB has the market share and the community buy-in to make the difference in supportability to usually take the prize unless you have a really really heavy write application Community Support and Development efforts - drivers, etc. Built in Sharding/Scaling via Replica Sets High writes and heavy reads – can be somewhat mutually exclusive MongoDB scales nearly linearly for heavy read workloads
  • #15: 3.4.10 as of Halloween - since released on Halloween, would avoid ;-) no tricks please - 3.4.9 considered a minor release overall but … But what does it look like really? Architecture overview next
  • #16: 1 Primary, 2 Secondaries - heartbeat communication for up/down state, replication to secondaries via oplog MongoDB has same kind of potential to scale UP instead of OUT – **NOTE - many people run MongoDB on dedicated larger bare metal hosts and grow by scaling up vertically However, if they continue to grow, they will run into many of the same challenges that traditional RDBMS's have So what about scaling OUT with Mongo? Religious War here
  • #17: MongoD’s – the data nodes – the shards - the Replica Sets (primary and 2 secondary members) MongoS’s – Query routers – talk to config servers and MongoD data nodes - get location metadata from config servers to route queries to the correct shard to satisfy a query and return the result Good design to have multiple mongoS query routers in sharded clusters – our environments have 4 Config servers – the Data Dictionary of Mongo - contains cluster/shard metadata – mapping of data set –3.0 and below Always keep exactly and ONLY 3 for PROD env’s. 3.2 and up, is now by default a replica set and is NOW Required to be WT – improves consistency of info in chunk map - aka where data extents reside If you lose or corrupt your configs, the mongoS will not know where the data resides - so can’t retrieve it …so effectively lost
  • #18: Too much to cover other than mention for you to look up later WT – new default, also for required config serer replica set vs 3 single db’s as before MMAP - still good for larger result sets or smaller, more frequent write activities, specifically updates Unless you have a lot of CPU and cores to throw at it for WT usage = reminder to talk about percona version that allows us to offer security features that usually only come with the more expensive Enterprise version SSL kerberos LDAP integration *** our experience there
  • #19: User Data in Games Inventory Management – update, decrease, increase inventory Shopping carts - tales of the long query and 1000 pairs of shoes CMS – Our expertise running Sitecore on Azure
  • #21: A search engine but a whole lot more MUCH more powerful than JUST a search engine GeoAnalytics - Geo near me
  • #22: Basically Clusters with Nodes holding Indexes then split across hosts with Shards Holding slices of data held in segments at the lucene chunk level Composed of the data via documents written in JSON
  • #24: There are lots of reasons to use multiple components of the Elastic Stack Including for Visualization which we will talk about a bit later. But 1st let’s talk about just elasticsearch
  • #25: With Elastic, to increase in scale and add more performance, you increase the Replication Factor Basically ADD NODES -this increases HW resources to improve search performance and improve redundancy The number of replica shards can be changed dynamically on a live cluster, allowing us to scale up or down as demand requires. And Elastic will automatically redistribute as needed nine shards: three primaries and six replicas. This means that we can scale out to a total of nine nodes, again with one shard per node. This would allow us to triple search performance compared to our original three-node cluster.
  • #26: here Logging and Log Analysis Basically taking over for Splunk which has become too expensive
  • #27: Elasticsearch has made massive improvements to its geospatial capabilities in the last 2 releases It way outperforms the geospatial abilities of MongoDB’s $geoNear and within operators Which is why you would look to combine them – which we will talk about later on But other good uses of Elasticsearch combined with elements of its Elastic STACK
  • #28: But other good uses of Elasticsearch combined with elements of its Elastic STACK BUT Now to Summarize those 2 – MongoDB and Elasticsearch
  • #29: Summarize those 2 Both store data objects that have key-value pair, both allow querying that body of objects. But both come from 2 different camps and are made for different purposes. Elastic - Great with full and fuzzy text searching Slow when adding ‘new’ Data -  aka creating new indexes Uses indexes to help you find the data  - fast Completes complex search queries quickly  Interacts well directly other associated technologies – kibana, beats, logstash, etc. and other NoSQL and SQL DB’s 
  • #30: In the end, it is about the ability to store data, aggregate things, pass it along. Then ANALYZE and USE that data analysis for whatever purpose you desire So let’s look at these 2 together now
  • #31: - When your data has a lot of arrays - to perform queries that required many different $and clauses on an field with an array as a value.  MANY Smaller shards as they need additional write scopes 2nd case  - Fuzzy - If you want to do a search on the word chicken in a menu application:
  • #33: Examples of How we combine MongoDB and Elasticserch CURRENTLY at ObjectRocket
  • #34: POTENTIAL and or Theoretical New Use Cases Possibilities and Potential Combination uses are very broad – New emerging markets and areas – from cryptocurrency peripherals for persistence to
  • #35: Use MongoDB to Analyze cryptocurrency price swings and intervals - https://medium.com/@serbanmihai/aggregate-mongodb-data-with-node-js-and-mongoose-cryptocurrency-financial-time-series-ae739b4c9485
  • #36: node.js (v7.8 or greater) Persistence is achieved using MongoDB tribeca - very low latency cryptocurrency market making trade bot with a full featured web client, backtester, and supports direct connectivity to several crypto coin exchanges  - reacts to market data by placing and canceling orders in under a millisecond
  • #37: Fantasy Football wine sites -If you want to do a search and possibly a match on the words wine & red db.comments.find( { $and: [ { $text: { $search: "win" },  { $text: { $search: "red" }  }  ] } ) WON’T work $text special MongoDB operator - only use once per query,
  • #38: Endless opportunities here to combine with other data stores - grab those result sets, store the primary results in MongoDB, perform additional aggregations to further refine them Post online for massive around the world use by colleagues Use Elasticsearch again to keep frequently searched combinations nearby/fast
  • #39: Endless opportunities here to combine with other data stores - grab those result sets, store the primary results in MongoDB, perform additional aggregations to further refine them Post online for massive around the world use by colleagues Use elasticsearch again to keep frequently searched combinations nearby/fast
  • #41: Hiring DBA’s and CDE’s