SlideShare a Scribd company logo
#MDBlocal
Using Change Streams to Keep Up With Your Data
TORONTO
Kevin Albertson
November 21
21 22 23 24
25 26 27 28
28
8 days until Black Friday
mongodb-kitchen.com
The Shard Web Scale SauceThe Recipe Collection
BSON Pot
for your unstructured meals™
Current Flow
MongoDBKitchen.com
Mobile App
Third Party API
Catalog
db.orders
Read the source of truth
every day at 5pm
for each new order
if in inventory… ship
otherwise… manufacture, then ship
BSON Pot
• expensive to manufacture
• low margin
• can't anticipate demand
Solution: no inventory
BSON Pot
• long time to manufacture
• want quick delivery
Solution: manufacture
on demand
when customer orders BSON Pot
immediately manufacture BSON Pot
MongoDBKitchen.com
Mobile App
Third Party API
Catalog
db.orders
Listen to the source of truthRead the source of truth
pull —> push
21 22 23 24
25 26 27 28
28
8 days until Black Friday
21 22 23 24
25 26 27 28
28
3 days to implement
QA QA QA QA
QA
21 22 23
Day 1
Tail the Oplog?
Not document or supported, but….
P
SS
Oplog
Oplog: special capped collection
of operations "operation log"
Secondaries tail the oplog
of the sync source
> use local
> db.oplog.findOne()
{
"ts": Timestamp(1573359145, 2),
"t": NumberLong(1),
"h": NumberLong(0),
"v": 2,
"op": "i",
"ns": "mdbkitchen.orders",
"ui": UUID("5de76b13-cb71-4fd1-b2da-3e9f44400162"),
"wall": ISODate("2019-11-10T04:12:25.747Z"),
"o": {
"_id": ObjectId("5dc78e29cd45383e19bbfed1"),
"sku": 7318166,
"name": "BSON Pot",
"user_id": ObjectId("5dc78e29cd45383e19bbfed0")
"status": "Not Shipped"
}
}
> use local
> db.oplog.findOne()
{
"ts": Timestamp(1573359145, 2),
"t": NumberLong(1),
"h": NumberLong(0),
"v": 2,
"op": "i",
"ns": "mdbkitchen.orders",
"ui": UUID("5de76b13-cb71-4fd1-b2da-3e9f44400162"),
"wall": ISODate("2019-11-10T04:12:25.747Z"),
"o": {
"_id": ObjectId("5dc78e29cd45383e19bbfed1"),
"sku": 7318166,
"name": "BSON Pot",
"user_id": ObjectId("5dc78e29cd45383e19bbfed0")
"status": "Not Shipped"
}
}
listener.py
db = client.local
cursor = db.oplog.rs.find(
{ "op": "i", "o.sku": sku },
cursor_type=CursorType.TAILABLE_AWAIT
)
while cursor.alive:
for doc in cursor:
manufacture_order (doc)
Make it Robust
listener.py
db = client.local
cursor = db.oplog.rs.find(
{ "op": "i", "o.sku": sku },
cursor_type=CursorType.TAILABLE_AWAIT
)
while cursor.alive:
for doc in cursor:
manufacture_order (doc)
listener.py
last_saved_match = loads (open("saved.json", "r").read())
filter["ts"] = { "$gt": last_saved_match["ts"] }
open ("saved.json", "w").write (dumps (doc))
db = client.local
cursor = db.oplog.rs.find(
{ "op": "i", "o.sku": sku },
cursor_type=CursorType.TAILABLE_AWAIT
)
while cursor.alive:
for doc in cursor:
manufacture_order (doc)
listener.py
while True:
time.sleep (1)
last_saved_match = loads (open("saved.json", "r").read())
filter["ts"] = { "$gt": last_saved_match["ts"] }
open ("saved.json", "w").write (dumps (doc))
db = client.local
cursor = db.oplog.rs.find(
{ "op": "i", "o.sku": sku },
cursor_type=CursorType.TAILABLE_AWAIT
)
while cursor.alive:
for doc in cursor:
manufacture_order (doc)
Me:
21 22 23
Day 2
QA: "After failover it sends
wrong events… sometimes"
S PPPlistener.py S
A
B
C
cursor replication
A
B
C
A
B
Can't undo…
replication
Solution: only use majority committed events
Rollback
while True:
time.sleep (1)
last_saved_match = loads (open("saved.json", "r").read())
filter["ts"] = { "$gt": last_saved_match["ts"] }
open ("saved.json", "w").write (dumps (doc))
db = client.local
cursor = db.oplog.rs.find(
{ "op": "i", "o.sku": sku },
cursor_type=CursorType.TAILABLE_AWAIT
)
while cursor.alive:
for doc in cursor:
manufacture_order (doc)
listener.py
while True:
time.sleep (1)
last_saved_match = loads (open("saved.json", "r").read())
filter["ts"] = { "$gt": last_saved_match["ts"] }
open ("saved.json", "w").write (dumps (doc))
db = client.local
cursor = coll.find(
{ "op": "i", "o.sku": sku },
cursor_type=CursorType.TAILABLE_AWAIT
)
while cursor.alive:
for doc in cursor:
manufacture_order (doc)
listener.py
rc = ReadConcern(level="majority")
coll = db.get_collection("oplog.rs", read_concern=rc)
QA: "Restarting after long time
with no orders hurts perf"
Plistener.py
A
cursor
A
saved.json
match
non-match
B
Z
...
A
Restart
Solution: save all events
while True:
time.sleep (1)
last_saved_match = loads (open("saved.json", "r").read())
filter["ts"] = { "$gt": last_saved_match["ts"] }
open ("saved.json", "w").write (dumps (doc))
db = client.local
cursor = coll.find(
{ "op": "i", "o.sku": sku },
cursor_type=CursorType.TAILABLE_AWAIT
)
while cursor.alive:
for doc in cursor:
manufacture_order (doc)
listener.py
rc = ReadConcern(level="majority")
coll = db.get_collection("oplog.rs", read_concern=rc)
if doc["op"] == "i" and doc["o"]["sku"] == 7318166:
manufacture_order (doc)
open ("saved.json", "w").write (dumps (doc))
while True:
time.sleep (1)
last_saved_match = loads (open("saved.json", "r").read())
filter["ts"] = { "$gt": last_saved_match["ts"] }
db = client.local
cursor = coll.find(
{},
cursor_type=CursorType.TAILABLE_AWAIT
)
while cursor.alive:
for doc in cursor:
rc = ReadConcern(level="majority")
coll = db.get_collection("oplog.rs", read_concern=rc)
listener.py
Me: ¯_( )_/¯
21 22 23
Day 3
QA: "Doesn't work when orders
is sharded"
…
MongoDB .local Toronto 2019: Using Change Streams to Keep Up with Your Data
Change Streams
Data event listeners
change_stream = db.orders.watch()
Added in MongoDB 3.6
Change Streams:
8 Characteristics
1. Present a Defined API
NodeJS let changestream = db.collection("orders").watch()
changestream.on("change", (event) => { console.log(event) })
Python changestream = db.orders.watch()
for event in changestream:
print event
C++ auto changestream = db["orders"].watch();
for (auto& event : changestream) {
cout << to_json(event) << endl;
}
2. Use Access Controls
db.createRole({
role: "analyzer",
privileges: [
{
resource: { db: "test", collection: "example" },
actions: [ "find", "changeStream" ]
},
],
roles: []
})
3. Use on any Data-Bearing Node
P
Oplog
db.coll.watch()
S
Oplog
db.coll.watch()
4. Total Ordering Across Shards
Shard Shard Shard
3 1 2
mongos
1 2 3
5. Documents Uniquely Identified
Sharded Cluster
{
operationType: 'update'
documentKey: {
_id: 123,
shardKey: 456
}
...
}
{
operationType: 'update'
documentKey: {
_id: 123
}
...
}
Replica Set
6. Changes are Durable
S PPP S
Oplog Oplog
db.coll.watch()
PS
Oplog
P
7. Change Streams are Resumable
{
_id: <resumeToken>,
operationType: 'update'
...
}
PP S
Oplog Oplog
db.coll.watch()
PS
Oplog
No duplicates
No missed events
8. Change Streams Use Aggregation
$match $project $addFields $replaceRoot $redact
coll.watch([{
$match: {
operationType: {$in: ['insert', 'update'] }
}
}]);
1. Defined API
2. Access Controls
3. Use on any Data Bearing Node
4. Total Ordering
5. Uniquely Identifies Documents
6. Durable
7. Resumable
8. Aggregation
Change Stream API
change_stream =
… client.watch()
… db.watch()
… coll.watch()
Filter with pipelines
change_stream = coll.watch ([{$match: {…}}])
Iterate to get events (blocking)
doc = change_stream.next()
for doc in change_stream.next()
Set polling frequency
change_stream = coll.watch (maxAwaitTimeMS=500)
change_stream.on("change", callback)
Or listen in async (in NodeJS, Java async, C#)
Track with a "resume token"
token = change_stream.resume_token
And restart
change_stream = coll.watch (resume_after=token)
Other options
fullDocument
include entire document in update events
startAfter
like resumeAfter, except fails if stream invalidated
Change Stream Events
Returns 8 operation types
• Insert
• Update
• Replace
• Delete
• Rename
• Drop
• DropDatabase
• Invalidate
collection.watch()
Insert, Update, Replace, Delete on the collection.
Drop/Rename on the collection. Always followed by an Invalidate.
Invalidate on when collection dropped or renamed. Closes connection.
database.watch()
Insert, Update, Replace, Delete on all nested collections.
Drop/Rename on all nested collections. Not followed by an invalidate.
Invalidate on when database dropped. Closes connection.
DropDatabase when database dropped. Always followed by an invalidate.
client.watch()
Insert, Update, Replace, Delete on all databases and collections.
Drop/Rename on all collections.
Invalidate on all databases and collections. Does not close connection.
DropDatabase on all databases.
In Action
BSON Pot
for your unstructured meals™
db = client.mdbkitchen
pipeline = [{ "$match": { "operationType": "insert", "fullDocument.sku": sku } }]
listener.py
def save_token (change_stream):
open ("token.json", "w").write (dumps (change_stream.resume_token))
def load_token ():
loads (open ("token.json", "r").read())
change_stream = db.orders.watch(
pipeline,
start_after=load_token()
)
while change_stream.alive:
for doc in change_stream:
manufacture_order (doc)
save_token (change_stream)
save_token (change_stream)
• MongoDB supported API
• Has retry logic
• Events aren't rolled back
• No perf hit for sparse events
• Works if sharded too
Me:
QA:
MongoDBeer
21 22 23
Day 3
Performance
P
listener.py
One change stream
P
listener.py
~1000 per server
listener.py listener.py
P
listener.py
Can scale with secondaries…
listener.py listener.py
S
listener.pylistener.py listener.py
P
but middleware is better
listener.py
Middleware
listener.py
listener.py
listener.py
listener.py
P
e.g. Apache Kafka
listener.py
Apache Kafka
listener.py
listener.py
listener.py
listener.py
Apache Kafka is
a stream processing service
that can read from and write to external systems
Any Source Any Sink
Apache Kafka
Apache Kafka is
a stream processing service
that can read from and write to external systems
Apache Kafka
Apache Kafka is
a stream processing service
that can read from and write to external systems
Apache Kafka
MongoDB provides
confluent-hub install mongodb/kafka-connect-mongodb:0.2
a Kafka source and sink connector
verified by Confluent
Tracking state
BSON Pot
for your unstructured meals™
Want to track all BSON Pot
order state changes
orders = orders.with_options(read_concern="majority")
cursor = db.orders.find({"sku": 7318166 })
cache = list (cursor)
change_stream = orders.watch (
[{ "$match": { "sku": 7318166 }}])
for event in change_stream:
update_cache (cache, update)
update occurs, missed!
Switch find and watch?
orders = orders.with_options(read_concern="majority")
cursor = db.orders.find({"sku": 7318166 })
cache = list (cursor)
change_stream = orders.watch (
[{ "$match": { "sku": 7318166 }}])
for event in change_stream:
update_cache (cache, update)
But what if find selects old secondary?
Use sessions!
P PP S2
Oplog Oplog
PS1
Oplog
db.coll.watch() db.coll.find()
First replicate up to S1
Session
orders = orders.with_options(read_concern="majority")
cursor = db.orders.find(
{"sku": 7318166 }, session=session)
cache = list (cursor)
change_stream = orders.watch (
[{ "$match": { "sku": 7318166 }}], session=session)
for event in change_stream:
update_cache (cache, update)
with client.start_session(causal_consistency=True) as session:
Thank you

More Related Content

What's hot (20)

PPTX
MongoDB Aggregation
Amit Ghosh
 
PPTX
MongoDB - Aggregation Pipeline
Jason Terpko
 
PPTX
High Performance Applications with MongoDB
MongoDB
 
PDF
Faites évoluer votre accès aux données avec MongoDB Stitch
MongoDB
 
PDF
Cutting Edge Data Processing with PHP & XQuery
William Candillon
 
PDF
VBA API for scriptDB primer
Bruce McPherson
 
PDF
Scalable XQuery Processing with Zorba on top of MongoDB
William Candillon
 
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
PDF
Benchx: An XQuery benchmarking web application
Andy Bunce
 
PPTX
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...
MongoDB
 
PDF
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB
MongoDB
 
PDF
Time series databases
Source Ministry
 
PPTX
MongoDB Stich Overview
MongoDB
 
PDF
MongoDB Performance Tuning
MongoDB
 
PPTX
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
Gianfranco Palumbo
 
PDF
Streaming Operational Data with MariaDB MaxScale
MariaDB plc
 
PDF
MongoDB Aggregation Framework
Caserta
 
PPT
Mongo Web Apps: OSCON 2011
rogerbodamer
 
PPTX
Dbabstraction
Bruce McPherson
 
PPTX
Using Spark to Load Oracle Data into Cassandra
Jim Hatcher
 
MongoDB Aggregation
Amit Ghosh
 
MongoDB - Aggregation Pipeline
Jason Terpko
 
High Performance Applications with MongoDB
MongoDB
 
Faites évoluer votre accès aux données avec MongoDB Stitch
MongoDB
 
Cutting Edge Data Processing with PHP & XQuery
William Candillon
 
VBA API for scriptDB primer
Bruce McPherson
 
Scalable XQuery Processing with Zorba on top of MongoDB
William Candillon
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
Benchx: An XQuery benchmarking web application
Andy Bunce
 
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...
MongoDB
 
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB
MongoDB
 
Time series databases
Source Ministry
 
MongoDB Stich Overview
MongoDB
 
MongoDB Performance Tuning
MongoDB
 
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
Gianfranco Palumbo
 
Streaming Operational Data with MariaDB MaxScale
MariaDB plc
 
MongoDB Aggregation Framework
Caserta
 
Mongo Web Apps: OSCON 2011
rogerbodamer
 
Dbabstraction
Bruce McPherson
 
Using Spark to Load Oracle Data into Cassandra
Jim Hatcher
 

Similar to MongoDB .local Toronto 2019: Using Change Streams to Keep Up with Your Data (20)

PDF
Nosql hands on handout 04
Krishna Sankar
 
PPTX
MongoDB World 2018: Keynote
MongoDB
 
PPTX
Building .NET Apps using Couchbase Lite
gramana
 
PDF
Concurrency at the Database Layer
mcwilson1
 
PDF
Fun Teaching MongoDB New Tricks
MongoDB
 
PDF
Data Modeling and Relational to NoSQL
DATAVERSITY
 
PPTX
Meet the squirrel @ #CSHUG
Márton Balassi
 
PDF
CouchDB Mobile - From Couch to 5K in 1 Hour
Peter Friese
 
KEY
Couchdb: No SQL? No driver? No problem
delagoya
 
PDF
Latinoware
kchodorow
 
PPTX
Async Redux Actions With RxJS - React Rally 2016
Ben Lesh
 
PPTX
Kick your database_to_the_curb_reston_08_27_19
confluent
 
PDF
Getting Started with Couchbase Ruby
Sergey Avseyev
 
PDF
XQuery Rocks
William Candillon
 
KEY
CouchDB : More Couch
delagoya
 
PDF
Backbone.js — Introduction to client-side JavaScript MVC
pootsbook
 
KEY
CouchDB on Android
Sven Haiges
 
PDF
Terrastore - A document database for developers
Sergio Bossa
 
PDF
Writing Redis in Python with asyncio
James Saryerwinnie
 
PPTX
Dev ops meetup
Bigdata Meetup Kochi
 
Nosql hands on handout 04
Krishna Sankar
 
MongoDB World 2018: Keynote
MongoDB
 
Building .NET Apps using Couchbase Lite
gramana
 
Concurrency at the Database Layer
mcwilson1
 
Fun Teaching MongoDB New Tricks
MongoDB
 
Data Modeling and Relational to NoSQL
DATAVERSITY
 
Meet the squirrel @ #CSHUG
Márton Balassi
 
CouchDB Mobile - From Couch to 5K in 1 Hour
Peter Friese
 
Couchdb: No SQL? No driver? No problem
delagoya
 
Latinoware
kchodorow
 
Async Redux Actions With RxJS - React Rally 2016
Ben Lesh
 
Kick your database_to_the_curb_reston_08_27_19
confluent
 
Getting Started with Couchbase Ruby
Sergey Avseyev
 
XQuery Rocks
William Candillon
 
CouchDB : More Couch
delagoya
 
Backbone.js — Introduction to client-side JavaScript MVC
pootsbook
 
CouchDB on Android
Sven Haiges
 
Terrastore - A document database for developers
Sergio Bossa
 
Writing Redis in Python with asyncio
James Saryerwinnie
 
Dev ops meetup
Bigdata Meetup Kochi
 
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
PDF
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB
 
Ad

Recently uploaded (20)

PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
Digital Circuits, important subject in CS
contactparinay1
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 

MongoDB .local Toronto 2019: Using Change Streams to Keep Up with Your Data

  • 1. #MDBlocal Using Change Streams to Keep Up With Your Data TORONTO Kevin Albertson
  • 3. 21 22 23 24 25 26 27 28 28 8 days until Black Friday
  • 4. mongodb-kitchen.com The Shard Web Scale SauceThe Recipe Collection
  • 5. BSON Pot for your unstructured meals™
  • 7. MongoDBKitchen.com Mobile App Third Party API Catalog db.orders Read the source of truth
  • 8. every day at 5pm for each new order if in inventory… ship otherwise… manufacture, then ship
  • 9. BSON Pot • expensive to manufacture • low margin • can't anticipate demand Solution: no inventory
  • 10. BSON Pot • long time to manufacture • want quick delivery Solution: manufacture on demand
  • 11. when customer orders BSON Pot immediately manufacture BSON Pot
  • 12. MongoDBKitchen.com Mobile App Third Party API Catalog db.orders Listen to the source of truthRead the source of truth pull —> push
  • 13. 21 22 23 24 25 26 27 28 28 8 days until Black Friday
  • 14. 21 22 23 24 25 26 27 28 28 3 days to implement QA QA QA QA QA
  • 16. Tail the Oplog? Not document or supported, but….
  • 17. P SS Oplog Oplog: special capped collection of operations "operation log" Secondaries tail the oplog of the sync source
  • 18. > use local > db.oplog.findOne() { "ts": Timestamp(1573359145, 2), "t": NumberLong(1), "h": NumberLong(0), "v": 2, "op": "i", "ns": "mdbkitchen.orders", "ui": UUID("5de76b13-cb71-4fd1-b2da-3e9f44400162"), "wall": ISODate("2019-11-10T04:12:25.747Z"), "o": { "_id": ObjectId("5dc78e29cd45383e19bbfed1"), "sku": 7318166, "name": "BSON Pot", "user_id": ObjectId("5dc78e29cd45383e19bbfed0") "status": "Not Shipped" } }
  • 19. > use local > db.oplog.findOne() { "ts": Timestamp(1573359145, 2), "t": NumberLong(1), "h": NumberLong(0), "v": 2, "op": "i", "ns": "mdbkitchen.orders", "ui": UUID("5de76b13-cb71-4fd1-b2da-3e9f44400162"), "wall": ISODate("2019-11-10T04:12:25.747Z"), "o": { "_id": ObjectId("5dc78e29cd45383e19bbfed1"), "sku": 7318166, "name": "BSON Pot", "user_id": ObjectId("5dc78e29cd45383e19bbfed0") "status": "Not Shipped" } }
  • 20. listener.py db = client.local cursor = db.oplog.rs.find( { "op": "i", "o.sku": sku }, cursor_type=CursorType.TAILABLE_AWAIT ) while cursor.alive: for doc in cursor: manufacture_order (doc)
  • 22. listener.py db = client.local cursor = db.oplog.rs.find( { "op": "i", "o.sku": sku }, cursor_type=CursorType.TAILABLE_AWAIT ) while cursor.alive: for doc in cursor: manufacture_order (doc)
  • 23. listener.py last_saved_match = loads (open("saved.json", "r").read()) filter["ts"] = { "$gt": last_saved_match["ts"] } open ("saved.json", "w").write (dumps (doc)) db = client.local cursor = db.oplog.rs.find( { "op": "i", "o.sku": sku }, cursor_type=CursorType.TAILABLE_AWAIT ) while cursor.alive: for doc in cursor: manufacture_order (doc)
  • 24. listener.py while True: time.sleep (1) last_saved_match = loads (open("saved.json", "r").read()) filter["ts"] = { "$gt": last_saved_match["ts"] } open ("saved.json", "w").write (dumps (doc)) db = client.local cursor = db.oplog.rs.find( { "op": "i", "o.sku": sku }, cursor_type=CursorType.TAILABLE_AWAIT ) while cursor.alive: for doc in cursor: manufacture_order (doc)
  • 25. Me:
  • 27. QA: "After failover it sends wrong events… sometimes"
  • 28. S PPPlistener.py S A B C cursor replication A B C A B Can't undo… replication Solution: only use majority committed events Rollback
  • 29. while True: time.sleep (1) last_saved_match = loads (open("saved.json", "r").read()) filter["ts"] = { "$gt": last_saved_match["ts"] } open ("saved.json", "w").write (dumps (doc)) db = client.local cursor = db.oplog.rs.find( { "op": "i", "o.sku": sku }, cursor_type=CursorType.TAILABLE_AWAIT ) while cursor.alive: for doc in cursor: manufacture_order (doc) listener.py
  • 30. while True: time.sleep (1) last_saved_match = loads (open("saved.json", "r").read()) filter["ts"] = { "$gt": last_saved_match["ts"] } open ("saved.json", "w").write (dumps (doc)) db = client.local cursor = coll.find( { "op": "i", "o.sku": sku }, cursor_type=CursorType.TAILABLE_AWAIT ) while cursor.alive: for doc in cursor: manufacture_order (doc) listener.py rc = ReadConcern(level="majority") coll = db.get_collection("oplog.rs", read_concern=rc)
  • 31. QA: "Restarting after long time with no orders hurts perf"
  • 33. while True: time.sleep (1) last_saved_match = loads (open("saved.json", "r").read()) filter["ts"] = { "$gt": last_saved_match["ts"] } open ("saved.json", "w").write (dumps (doc)) db = client.local cursor = coll.find( { "op": "i", "o.sku": sku }, cursor_type=CursorType.TAILABLE_AWAIT ) while cursor.alive: for doc in cursor: manufacture_order (doc) listener.py rc = ReadConcern(level="majority") coll = db.get_collection("oplog.rs", read_concern=rc)
  • 34. if doc["op"] == "i" and doc["o"]["sku"] == 7318166: manufacture_order (doc) open ("saved.json", "w").write (dumps (doc)) while True: time.sleep (1) last_saved_match = loads (open("saved.json", "r").read()) filter["ts"] = { "$gt": last_saved_match["ts"] } db = client.local cursor = coll.find( {}, cursor_type=CursorType.TAILABLE_AWAIT ) while cursor.alive: for doc in cursor: rc = ReadConcern(level="majority") coll = db.get_collection("oplog.rs", read_concern=rc) listener.py
  • 37. QA: "Doesn't work when orders is sharded"
  • 38.
  • 43. 1. Present a Defined API NodeJS let changestream = db.collection("orders").watch() changestream.on("change", (event) => { console.log(event) }) Python changestream = db.orders.watch() for event in changestream: print event C++ auto changestream = db["orders"].watch(); for (auto& event : changestream) { cout << to_json(event) << endl; }
  • 44. 2. Use Access Controls db.createRole({ role: "analyzer", privileges: [ { resource: { db: "test", collection: "example" }, actions: [ "find", "changeStream" ] }, ], roles: [] })
  • 45. 3. Use on any Data-Bearing Node P Oplog db.coll.watch() S Oplog db.coll.watch()
  • 46. 4. Total Ordering Across Shards Shard Shard Shard 3 1 2 mongos 1 2 3
  • 47. 5. Documents Uniquely Identified Sharded Cluster { operationType: 'update' documentKey: { _id: 123, shardKey: 456 } ... } { operationType: 'update' documentKey: { _id: 123 } ... } Replica Set
  • 48. 6. Changes are Durable S PPP S Oplog Oplog db.coll.watch() PS Oplog
  • 49. P 7. Change Streams are Resumable { _id: <resumeToken>, operationType: 'update' ... } PP S Oplog Oplog db.coll.watch() PS Oplog No duplicates No missed events
  • 50. 8. Change Streams Use Aggregation $match $project $addFields $replaceRoot $redact coll.watch([{ $match: { operationType: {$in: ['insert', 'update'] } } }]);
  • 51. 1. Defined API 2. Access Controls 3. Use on any Data Bearing Node 4. Total Ordering 5. Uniquely Identifies Documents 6. Durable 7. Resumable 8. Aggregation
  • 53. change_stream = … client.watch() … db.watch() … coll.watch()
  • 54. Filter with pipelines change_stream = coll.watch ([{$match: {…}}])
  • 55. Iterate to get events (blocking) doc = change_stream.next() for doc in change_stream.next() Set polling frequency change_stream = coll.watch (maxAwaitTimeMS=500)
  • 56. change_stream.on("change", callback) Or listen in async (in NodeJS, Java async, C#)
  • 57. Track with a "resume token" token = change_stream.resume_token And restart change_stream = coll.watch (resume_after=token)
  • 58. Other options fullDocument include entire document in update events startAfter like resumeAfter, except fails if stream invalidated
  • 60. Returns 8 operation types • Insert • Update • Replace • Delete • Rename • Drop • DropDatabase • Invalidate
  • 61. collection.watch() Insert, Update, Replace, Delete on the collection. Drop/Rename on the collection. Always followed by an Invalidate. Invalidate on when collection dropped or renamed. Closes connection.
  • 62. database.watch() Insert, Update, Replace, Delete on all nested collections. Drop/Rename on all nested collections. Not followed by an invalidate. Invalidate on when database dropped. Closes connection. DropDatabase when database dropped. Always followed by an invalidate.
  • 63. client.watch() Insert, Update, Replace, Delete on all databases and collections. Drop/Rename on all collections. Invalidate on all databases and collections. Does not close connection. DropDatabase on all databases.
  • 65. BSON Pot for your unstructured meals™
  • 66. db = client.mdbkitchen pipeline = [{ "$match": { "operationType": "insert", "fullDocument.sku": sku } }] listener.py def save_token (change_stream): open ("token.json", "w").write (dumps (change_stream.resume_token)) def load_token (): loads (open ("token.json", "r").read()) change_stream = db.orders.watch( pipeline, start_after=load_token() ) while change_stream.alive: for doc in change_stream: manufacture_order (doc) save_token (change_stream) save_token (change_stream)
  • 67. • MongoDB supported API • Has retry logic • Events aren't rolled back • No perf hit for sparse events • Works if sharded too
  • 68. Me:
  • 69. QA:
  • 75. P listener.py Can scale with secondaries… listener.py listener.py S listener.pylistener.py listener.py
  • 76. P but middleware is better listener.py Middleware listener.py listener.py listener.py listener.py
  • 77. P e.g. Apache Kafka listener.py Apache Kafka listener.py listener.py listener.py listener.py
  • 78. Apache Kafka is a stream processing service that can read from and write to external systems Any Source Any Sink Apache Kafka
  • 79. Apache Kafka is a stream processing service that can read from and write to external systems Apache Kafka
  • 80. Apache Kafka is a stream processing service that can read from and write to external systems Apache Kafka
  • 81. MongoDB provides confluent-hub install mongodb/kafka-connect-mongodb:0.2 a Kafka source and sink connector verified by Confluent
  • 83. BSON Pot for your unstructured meals™ Want to track all BSON Pot order state changes
  • 84. orders = orders.with_options(read_concern="majority") cursor = db.orders.find({"sku": 7318166 }) cache = list (cursor) change_stream = orders.watch ( [{ "$match": { "sku": 7318166 }}]) for event in change_stream: update_cache (cache, update) update occurs, missed!
  • 85. Switch find and watch?
  • 86. orders = orders.with_options(read_concern="majority") cursor = db.orders.find({"sku": 7318166 }) cache = list (cursor) change_stream = orders.watch ( [{ "$match": { "sku": 7318166 }}]) for event in change_stream: update_cache (cache, update) But what if find selects old secondary?
  • 88. P PP S2 Oplog Oplog PS1 Oplog db.coll.watch() db.coll.find() First replicate up to S1 Session
  • 89. orders = orders.with_options(read_concern="majority") cursor = db.orders.find( {"sku": 7318166 }, session=session) cache = list (cursor) change_stream = orders.watch ( [{ "$match": { "sku": 7318166 }}], session=session) for event in change_stream: update_cache (cache, update) with client.start_session(causal_consistency=True) as session: