SlideShare a Scribd company logo
Technical overview of Azure Cosmos DB
Technical overview of Azure Cosmos DB
I N T H I S S E S S I O N …
Azure Cosmos DB Core Concepts and What’s New @ //Build/ 2018
TL;DR High-Level Overview
Resource Model
Request Units
Partitioning
Replication
Automatic Indexing
New Goodies
Q&A
Technical overview of Azure Cosmos DB
SQL
MongoDB
Table API
Turnkey global
distribution
Elastic scale out
of storage & throughput
Guaranteed low latency
at the 99th percentile
Comprehensive
SLAs
Five well-defined
consistency models
A Z U R E C O S M O S D B
DocumentColumn-family
Key-value Graph
A globally distributed, massively scalable, multi-model database service
Technical overview of Azure Cosmos DB
Leveraging Azure Cosmos DB to automatically scale
your data across the globe
This module will reference partitioning in the context
of all Azure Cosmos DB modules and APIs.
R E S O U R C E M O D E L
Account
DatabaseDatabaseDatabase
DatabaseDatabaseContainer
DatabaseDatabaseItem
Account
DatabaseDatabaseDatabase
DatabaseDatabaseContainer
DatabaseDatabaseItem
A C C O U N T U R I A N D C R E D E N T I A L S
********.azure.com
IGeAvVUp …
C R E AT I N G A C C O U N T
Account
DatabaseDatabaseDatabase
DatabaseDatabaseContainer
DatabaseDatabaseItem
D ATA B A S E R E P R E S E N TAT I O N S
Account
DatabaseDatabaseDatabase
DatabaseDatabaseContainer
DatabaseDatabaseItem
DatabaseDatabaseContainer
DatabaseDatabaseItem
C O N TA I N E R R E P R E S E N TAT I O N S
Account
DatabaseDatabaseDatabase
DatabaseDatabaseContainer
DatabaseDatabaseItem
= Collection Graph Table
C R E AT I N G C O L L E C T I O N S – S Q L A P I
Account
DatabaseDatabaseDatabase
DatabaseDatabaseContainer
DatabaseDatabaseItem
C O N TA I N E R - L E V E L R E S O U R C E S
Account
DatabaseDatabaseDatabase
DatabaseDatabaseContainer
DatabaseDatabaseItem ConflictSproc Trigger UDF
S Y S T E M TO P O LO G Y ( B E H I N D T H E S C E N E S )
Resource
Manager
Language
Runtime(s)
Hosts
Query
Processor
RSM
Index Manager
Bw-tree++/ LLAMA++
Log Manager
IO Manager
Resource Governor
Transport
Database engine
Admission control
…
…
Planet Earth Azure regions Datacenters Stamps Fault domains
Cluster Machine Replica Database engine
Container
Various agents
R E S O U R C E H I E R A R C H Y
CONTAINERS
Logical resources “surfaced” to APIs as tables,
collections or graphs, which are made up of one or
more physical partitions or servers.
RESOURCE PARTITIONS
• Consistent, highly available, and resource-governed
coordination primitives
• Consist of replica sets, with each replica hosting an
instance of the database engine
Containers
Resource Partitions
CollectionsTables Graphs
Tenants
Leader
Follower
Follower
Forwarder
Replica Set
To remote resource partition(s)
Technical overview of Azure Cosmos DB
R E Q U E S T U N I T S
Request Units (RUs) is a rate-based currency
Abstracts physical resources for performing requests
Key to multi-tenancy, SLAs, and COGS efficiency
Foreground and background activities
% IOPS% CPU% Memory
R E Q U E S T U N I T S
Normalized across various access methods
1 read of 1 KB document from a single partition
Each request consumes fixed RUs
Applies to reads, writes, query, and stored procedures
GET
POST
PUT
Query
…
=
=
=
=
R E Q U E S T U N I T S
Provisioned in terms of RU/sec
Rate limiting based on amount of throughput provisioned
Can be increased or decreased instantaneously
Metered Hourly
Background processes like TTL expiration, index
transformations scheduled when quiescent
Min RU/sec
Max RU/sec
IncomingRequests
Replica Quiescent
Rate limit
No rate limiting
* N E W * P R O V I S I O N R U / S F O R A S E T O F C O N TA I N E R S
Remove friction for OSS NoSQL APIs
Provision RU/sec shared across containers
Mix containers with dedicated throughput and
containers with shared throughput
Elastically scale provisioned throughput for a
set of containers at any time
Technical overview of Azure Cosmos DB
E L A S T I C S C A L E O U T O F S TO R A G E A N D T H R O U G H P U T
SCALES AS YOUR APPS’ NEEDS CHANGE
Database elastically scales storage and throughput
How? Scale-out!
Collections can span across large clusters of machines
Can start small and seamlessly grow as your app grows
E L A S T I C S C A L E O U T O F S TO R A G E A N D T H R O U G H P U T
SCALES AS YOUR APPS’ NEEDS CHANGE
Database elastically scales storage and throughput
How? Scale-out!
Collections can span across large clusters of machines
Can start small and seamlessly grow as your app grows
PA R T I T I O N S
Cosmos DB Container
(e.g. Collection)
Partition Key: User ID
Logical Partitioning Abstraction
Behind the Scenes:
Physical Partition Sets
hash(User ID)
Psuedo-random distribution of data over range of possible hashed values
PA R T I T I O N S
…
Partition 1 Partition 2 Partition n
Frugal # of Partitions based on actual storage and throughput needs
(yielding scalability with low total cost of ownership)
hash(User ID)
Pseudo-random distribution of data over range of possible hashed values
Andrew
Mike
…
Bob
Dharma
Shireesh
Karthik
Rimma
Alice
Carol
…
PA R T I T I O N S
…
Partition 1 Partition 2 Partition n
What happens when partitions need to grow?
hash(User ID)
Pseudo-random distribution of data over range of possible hashed values
Andrew
Mike
…
Bob
Dharma
Shireesh
Karthik
Rimma
Alice
Carol
…
PA R T I T I O N S
Partition Ranges can be dynamically sub-divided to seamlessly
grow database as the application grows while simultaneously
maintaining high availability.
Partition management is fully managed by Azure Cosmos DB,
so you don't have to write code or manage your partitions.
+
Partition x Partition x1 Partition x2
hash(User ID)
Pseudo-random distribution of data over range of possible hashed values
Rimma
Karthik
…
Dharma
Shireesh
Karthik
Rimma
Alice
Carol
…
Dharma
Shireesh
…
PA R T I T I O N S
Best Practices: Design Goals for Choosing a Good Partition Key
• Distribute the overall request + storage volume
• Avoid “hot” partition keys
Steps for Success
• Ballpark scale needs (size/throughput)
• Understand the workload
• # of reads/sec vs writes per sec
• Use pareto principal (80/20 rule) to help optimize bulk of workload
• For reads – understand top 3-5 queries (look for common filters)
• For writes – understand transactional needs
General Tips
• Build a POC to strengthen your understanding of the workload and
iterate (avoid analyses paralysis)
• Don’t be afraid of having too many partition keys
• Partitions keys are logical
• More partition keys  more scalability
• Partition Key is scope for multi-record transactions and routing queries
• Queries can be intelligently routed via partition key
• Omitting partition key on query requires fan-out
* N E W * B U L K E X E C U TO R L I B R A R Y
Easy out-of-the-box bulk operation functionality
Supports bulk import and update
Auto handles congestion control + transient errors
10x client-side performance improvement
Easily scale-out clients across more VMs
Available starting with .NET and Java
Technical overview of Azure Cosmos DB
Technical overview of Azure Cosmos DB
T U R N K E Y G LO B A L D I S T R I B U T I O N
High Availability
• Automatic and Manual Failover
• Multi-homing API removes need for app redeployment
Low Latency (anywhere in the world)
• Packets cannot move fast than the speed of light
• Sending a packet across the world under ideal network
conditions takes 100’s of milliseconds
• You can cheat the speed of light – using data locality
• CDN’s solved this for static content
• Azure Cosmos DB solves this for dynamic content
T U R N K E Y G LO B A L D I S T R I B U T I O N
• Automatic and transparent replication worldwide
• Each partition hosts a replica set per region
• Customers can test end to end application
availability by programmatically simulating failovers
• All regions are hidden behind a single global URI
with multi-homing capabilities
• Customers can dynamically add / remove
additional regions at any time
Writes/
Reads
Reads
"airport" : “AMS" "airport" : “MEL"
West US
Container
"airport" : "LAX"
Local Distribution (via horizontal partitioning)
GlobalDistribution(ofresourcepartitions)
Reads
30K transactions/sec
Writes/
Reads
Reads
Reads
West Europe
30K transactions/sec
Partition-key = "airport"
R E P L I C AT I N G D ATA G LO B A L LY
R E P L I C AT I N G D ATA G LO B A L LY
R E P L I C AT I N G D ATA G LO B A L LY
A U TO M AT I C FA I LO V E R
A U TO M AT I C FA I LO V E R
M A N U A L FA I LO V E R
Strong Bounded-staleness Session Consistent prefix Eventual
F I V E W E L L - D E F I N E D C O N S I S T E N C Y M O D E L S
CHOOSE THE BEST CONSISTENCY MODEL FOR YOUR APP
Five well-defined, consistency models
Overridable on a per-request basis
Provides control over performance-consistency tradeoffs,
backed by comprehensive SLAs.
An intuitive programming model offering low latency and
high availability for your planet-scale app.
CLEAR TRADEOFFS
• Latency
• Availability
• Throughput
* N E W * M U LT I - M A S T E R ( P R E V I E W )
Perfect for Intelligent Cloud
and Intelligent Edge Applications
Write scalability around the world
Low latency writes around the world
99.999% High Availability around the world
Well-defined consistency models
Comprehensive conflict management
Technical overview of Azure Cosmos DB
H A N D L E A N Y D ATA W I T H N O
S C H E M A O R I N D E X I N G R E Q U I R E D
Azure Cosmos DB’s schema-less service automatically indexes all your
data, regardless of the data model, to delivery blazing fast queries.
Item Color
Microwave
safe
Liquid
capacity
CPU Memory Storage
Geek
mug
Graphite Yes 16ox ??? ??? ???
Coffee
Bean
mug
Tan No 12oz ??? ??? ???
Surface
book
Gray ??? ??? 3.4 GHz
Intel
Skylake
Core i7-
6600U
16GB 1 TB SSD
• Automatic index management
• Synchronous auto-indexing
• No schemas or secondary indices needed
• Works across every data model
GEEK
I N D E X I N G J S O N D O C U M E N T S
{
"locations": [
{
"country": "Germany",
"city": "Berlin"
},
{
"country": "France",
"city": "Paris"
}
],
"headquarter": "Belgium",
"exports": [
{ "city": "Moscow" },
{ "city": "Athens" }
]
}
locations headquarter exports
0
country city
Germany Berlin
1
country city
France Paris
0 1
city
Athens
city
Moscow
Belgium
I N D E X I N G J S O N D O C U M E N T S
{
"locations": [
{
"country": "Germany",
"city": "Bonn",
"revenue": 200
}
],
"headquarter": "Italy",
"exports": [
{
"city": "Berlin",
"dealers": [
{ "name": "Hans" }
]
},
{ "city": "Athens" }
]
}
locations headquarter exports
0
country city
Germany Bonn
revenue
200
0 1
citycity
Berlin
Italy
dealers
0
name
Hans
I N D E X I N G J S O N D O C U M E N T S
Athens
locations headquarter exports
0
country city
Germany Bonn
revenue
200
0 1
citycity
Berlin
Italy
dealers
0
name
Hans
locations headquarter exports
0
country city
Germany Berlin
1
country city
France Paris
0 1
city
Athens
city
Moscow
Belgium
I N V E R T E D I N D E X
locations headquarter exports
0
country city
Germany
Berlin
revenue
200
0 1
city
Athens
city
Berlin
Italy
dealers
0
name
Hans
Bonn
1
country city
France Paris
Belgium
Moscow
I N D E X P O L I C I E S
CUSTOM INDEXING POLICIES
Though all Azure Cosmos DB data is indexed by default, you
can specify a custom indexing policy for your collections.
Custom indexing policies allow you to design and customize
the shape of your index while maintaining schema flexibility.
• Define trade-offs between storage, write and query
performance, and query consistency
• Include or exclude documents and paths to and from the
index
• Configure various index types
{
"automatic": true,
"indexingMode": "Consistent",
"includedPaths": [{
"path": "/*",
"indexes": [{
"kind": "Hash",
"dataType": "String",
"precision": -1
}, {
"kind": "Range",
"dataType": "Number",
"precision": -1
}, {
"kind": "Spatial",
"dataType": "Point"
}]
}],
"excludedPaths": [{
"path": "/nonIndexedContent/*"
}]
}
Technical overview of Azure Cosmos DB
P R O V I S I O N T H R O U G H P U T F O R A S E T O F C O N TA I N E R S
Remove friction for OSS NoSQL APIs
Provision RU/sec shared across containers
Mix containers with dedicated throughput and
containers with shared throughput
Elastically scale provisioned throughput for a
set of containers at any time
B U L K E X E C U TO R L I B R A R Y
Easy out-of-the-box bulk operation functionality
Supports bulk import and update
Auto handles congestion control + transient errors
10x client-side performance improvement
Easily scale-out clients across more VMs
Available starting with .NET and Java
M U LT I - M A S T E R @ G LO B A L S C A L E ( P R E V I E W )
Perfect for Intelligent Cloud
and Intelligent Edge Applications
Write scalability around the world
Low latency writes around the world
99.999% High Availability around the world
Well-defined consistency models
Comprehensive conflict management
V N E T S E R V I C E E N D P O I N T
Secure communication without
exposing public endpoints
Limit access to specific VNET(s) subnet(s)
Compatible with IP Firewall ACLs
Available in all Azure regions
J AVA A S Y N C L I B R A R Y F O R S Q L A P I
New Async API surface for event-based
programs w/ observable sequencies
Leverages popular RxJava library
2x client-side performance improvement
Improved user experience
R E C A P
Azure Cosmos DB Core Concepts and What’s New @ //Build/ 2018
TL;DR High-Level Overview
Resource Model
Request Units
Partitioning
Replication
Automatic Indexing
New Goodies
Q&A
Technical overview of Azure Cosmos DB
Technical overview of Azure Cosmos DB

More Related Content

What's hot (20)

PPTX
Azure Fundamentals || AZ-900
thisiswali
 
PPTX
Azure Storage Services - Part 01
Neeraj Kumar
 
PPTX
Azure Governance
Benjamin Hüpeden
 
PDF
Announcing Databricks Cloud (Spark Summit 2014)
Databricks
 
PPTX
Azure DataBricks for Data Engineering by Eugene Polonichko
Dimko Zhluktenko
 
PPTX
Azure governance
girish goudar
 
PPTX
Introduction to AWS Lake Formation.pptx
SwathiPonugumati
 
PPTX
Azure Overview Arc
rajramab
 
PPTX
Azure Backup Simplifies
Tanawit Chansuchai
 
PPTX
Azure Databricks - An Introduction (by Kris Bock)
Daniel Toomey
 
PDF
【旧版】Oracle Database Cloud Service:サービス概要のご紹介 [2021年7月版]
オラクルエンジニア通信
 
PDF
Snowflake Company Presentation
AndrewJiang18
 
PDF
AWS Backup을 이용한 데이터베이스의 백업 자동화와 편리한 복구방법
Amazon Web Services Korea
 
PDF
Migrate to Microsoft Azure with Confidence
David J Rosenthal
 
PDF
Mastering Azure Monitor
Richard Conway
 
PDF
Oracle Data Guard による高可用性
Yahoo!デベロッパーネットワーク
 
PDF
Iceberg + Alluxio for Fast Data Analytics
Alluxio, Inc.
 
PPTX
Azure Cost Management
Stefano Tempesta
 
PDF
Oracle Cloud is Best for Oracle Database - High Availability
Markus Michalewicz
 
PDF
Oracle Cloud Storage Service & Oracle Database Backup Cloud Service
Jean-Philippe PINTE
 
Azure Fundamentals || AZ-900
thisiswali
 
Azure Storage Services - Part 01
Neeraj Kumar
 
Azure Governance
Benjamin Hüpeden
 
Announcing Databricks Cloud (Spark Summit 2014)
Databricks
 
Azure DataBricks for Data Engineering by Eugene Polonichko
Dimko Zhluktenko
 
Azure governance
girish goudar
 
Introduction to AWS Lake Formation.pptx
SwathiPonugumati
 
Azure Overview Arc
rajramab
 
Azure Backup Simplifies
Tanawit Chansuchai
 
Azure Databricks - An Introduction (by Kris Bock)
Daniel Toomey
 
【旧版】Oracle Database Cloud Service:サービス概要のご紹介 [2021年7月版]
オラクルエンジニア通信
 
Snowflake Company Presentation
AndrewJiang18
 
AWS Backup을 이용한 데이터베이스의 백업 자동화와 편리한 복구방법
Amazon Web Services Korea
 
Migrate to Microsoft Azure with Confidence
David J Rosenthal
 
Mastering Azure Monitor
Richard Conway
 
Oracle Data Guard による高可用性
Yahoo!デベロッパーネットワーク
 
Iceberg + Alluxio for Fast Data Analytics
Alluxio, Inc.
 
Azure Cost Management
Stefano Tempesta
 
Oracle Cloud is Best for Oracle Database - High Availability
Markus Michalewicz
 
Oracle Cloud Storage Service & Oracle Database Backup Cloud Service
Jean-Philippe PINTE
 

Similar to Technical overview of Azure Cosmos DB (20)

PDF
Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...
Andre Essing
 
PDF
Azure Cosmos DB - Technical Deep Dive
Andre Essing
 
PDF
Azure Cosmos DB - The Swiss Army NoSQL Cloud Database
BizTalk360
 
PPTX
Tech-Spark: Exploring the Cosmos DB
Ralph Attard
 
PDF
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Bob Pusateri
 
PDF
Modeling data and best practices for the Azure Cosmos DB.
Mohammad Asif
 
PDF
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Bob Pusateri
 
PDF
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Bob Pusateri
 
PPTX
cosmodb ppt.pptxfkhkfsgkhgfkfghkhsadaljlsfdfhkgjh
Central University of South Bihar
 
PPTX
Tour de France Azure PaaS 3/7 Stocker des informations
Alex Danvy
 
PDF
[db tech showcase Tokyo 2019] Azure Cosmos DB Deep Dive ~ Partitioning, Globa...
Naoki (Neo) SATO
 
PDF
Dealing with Azure Cosmos DB
Mihail Mateev
 
PPTX
cosmodb ppt personal.pptxgskjhkjsfgkhkjgskhk
Central University of South Bihar
 
PPTX
Azure Cosmos DB by Mohammed Gadi AUG April 2019
Mohammed Gadi
 
PDF
Lessons learnt from building a globally distributed database service from the...
J On The Beach
 
PPTX
Session: Modern Data WareHouse
Karina Matos
 
PPTX
Azure CosmosDb - Where we are
Marco Parenzan
 
PDF
Cosmos DB at VLDB 2019
Dharma Shukla
 
PPTX
Cosmos db
Martino Bordin
 
PPTX
Azure Cosmos DB + Gremlin API in Action
Denys Chamberland
 
Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...
Andre Essing
 
Azure Cosmos DB - Technical Deep Dive
Andre Essing
 
Azure Cosmos DB - The Swiss Army NoSQL Cloud Database
BizTalk360
 
Tech-Spark: Exploring the Cosmos DB
Ralph Attard
 
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Bob Pusateri
 
Modeling data and best practices for the Azure Cosmos DB.
Mohammad Asif
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Bob Pusateri
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Bob Pusateri
 
cosmodb ppt.pptxfkhkfsgkhgfkfghkhsadaljlsfdfhkgjh
Central University of South Bihar
 
Tour de France Azure PaaS 3/7 Stocker des informations
Alex Danvy
 
[db tech showcase Tokyo 2019] Azure Cosmos DB Deep Dive ~ Partitioning, Globa...
Naoki (Neo) SATO
 
Dealing with Azure Cosmos DB
Mihail Mateev
 
cosmodb ppt personal.pptxgskjhkjsfgkhkjgskhk
Central University of South Bihar
 
Azure Cosmos DB by Mohammed Gadi AUG April 2019
Mohammed Gadi
 
Lessons learnt from building a globally distributed database service from the...
J On The Beach
 
Session: Modern Data WareHouse
Karina Matos
 
Azure CosmosDb - Where we are
Marco Parenzan
 
Cosmos DB at VLDB 2019
Dharma Shukla
 
Cosmos db
Martino Bordin
 
Azure Cosmos DB + Gremlin API in Action
Denys Chamberland
 
Ad

More from Microsoft Tech Community (20)

PPTX
100 ways to use Yammer
Microsoft Tech Community
 
PPTX
10 Yammer Group Suggestions
Microsoft Tech Community
 
PPTX
Removing Security Roadblocks to IoT Deployment Success
Microsoft Tech Community
 
PPTX
Building mobile apps with Visual Studio and Xamarin
Microsoft Tech Community
 
PPTX
Best practices with Microsoft Graph: Making your applications more performant...
Microsoft Tech Community
 
PPTX
Interactive emails in Outlook with Adaptive Cards
Microsoft Tech Community
 
PPTX
Unlocking security insights with Microsoft Graph API
Microsoft Tech Community
 
PPTX
Break through the serverless barriers with Durable Functions
Microsoft Tech Community
 
PPTX
Multiplayer Server Scaling with Azure Container Instances
Microsoft Tech Community
 
PPTX
Explore Azure Cosmos DB
Microsoft Tech Community
 
PPTX
Media Streaming Apps with Azure and Xamarin
Microsoft Tech Community
 
PPTX
DevOps for Data Science
Microsoft Tech Community
 
PPTX
Real-World Solutions with PowerApps: Tips & tricks to manage your app complexity
Microsoft Tech Community
 
PPTX
Azure Functions and Microsoft Graph
Microsoft Tech Community
 
PPTX
Ingestion in data pipelines with Managed Kafka Clusters in Azure HDInsight
Microsoft Tech Community
 
PPTX
Getting Started with Visual Studio Tools for AI
Microsoft Tech Community
 
PPTX
Using AML Python SDK
Microsoft Tech Community
 
PPTX
Mobile Workforce Location Tracking with Bing Maps
Microsoft Tech Community
 
PPTX
Cognitive Services Labs in action Anomaly detection
Microsoft Tech Community
 
PPTX
Speech Devices SDK
Microsoft Tech Community
 
100 ways to use Yammer
Microsoft Tech Community
 
10 Yammer Group Suggestions
Microsoft Tech Community
 
Removing Security Roadblocks to IoT Deployment Success
Microsoft Tech Community
 
Building mobile apps with Visual Studio and Xamarin
Microsoft Tech Community
 
Best practices with Microsoft Graph: Making your applications more performant...
Microsoft Tech Community
 
Interactive emails in Outlook with Adaptive Cards
Microsoft Tech Community
 
Unlocking security insights with Microsoft Graph API
Microsoft Tech Community
 
Break through the serverless barriers with Durable Functions
Microsoft Tech Community
 
Multiplayer Server Scaling with Azure Container Instances
Microsoft Tech Community
 
Explore Azure Cosmos DB
Microsoft Tech Community
 
Media Streaming Apps with Azure and Xamarin
Microsoft Tech Community
 
DevOps for Data Science
Microsoft Tech Community
 
Real-World Solutions with PowerApps: Tips & tricks to manage your app complexity
Microsoft Tech Community
 
Azure Functions and Microsoft Graph
Microsoft Tech Community
 
Ingestion in data pipelines with Managed Kafka Clusters in Azure HDInsight
Microsoft Tech Community
 
Getting Started with Visual Studio Tools for AI
Microsoft Tech Community
 
Using AML Python SDK
Microsoft Tech Community
 
Mobile Workforce Location Tracking with Bing Maps
Microsoft Tech Community
 
Cognitive Services Labs in action Anomaly detection
Microsoft Tech Community
 
Speech Devices SDK
Microsoft Tech Community
 
Ad

Recently uploaded (20)

PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
Digital Circuits, important subject in CS
contactparinay1
 

Technical overview of Azure Cosmos DB

  • 3. I N T H I S S E S S I O N … Azure Cosmos DB Core Concepts and What’s New @ //Build/ 2018 TL;DR High-Level Overview Resource Model Request Units Partitioning Replication Automatic Indexing New Goodies Q&A
  • 5. SQL MongoDB Table API Turnkey global distribution Elastic scale out of storage & throughput Guaranteed low latency at the 99th percentile Comprehensive SLAs Five well-defined consistency models A Z U R E C O S M O S D B DocumentColumn-family Key-value Graph A globally distributed, massively scalable, multi-model database service
  • 7. Leveraging Azure Cosmos DB to automatically scale your data across the globe This module will reference partitioning in the context of all Azure Cosmos DB modules and APIs. R E S O U R C E M O D E L Account DatabaseDatabaseDatabase DatabaseDatabaseContainer DatabaseDatabaseItem
  • 8. Account DatabaseDatabaseDatabase DatabaseDatabaseContainer DatabaseDatabaseItem A C C O U N T U R I A N D C R E D E N T I A L S ********.azure.com IGeAvVUp …
  • 9. C R E AT I N G A C C O U N T Account DatabaseDatabaseDatabase DatabaseDatabaseContainer DatabaseDatabaseItem
  • 10. D ATA B A S E R E P R E S E N TAT I O N S Account DatabaseDatabaseDatabase DatabaseDatabaseContainer DatabaseDatabaseItem DatabaseDatabaseContainer DatabaseDatabaseItem
  • 11. C O N TA I N E R R E P R E S E N TAT I O N S Account DatabaseDatabaseDatabase DatabaseDatabaseContainer DatabaseDatabaseItem = Collection Graph Table
  • 12. C R E AT I N G C O L L E C T I O N S – S Q L A P I Account DatabaseDatabaseDatabase DatabaseDatabaseContainer DatabaseDatabaseItem
  • 13. C O N TA I N E R - L E V E L R E S O U R C E S Account DatabaseDatabaseDatabase DatabaseDatabaseContainer DatabaseDatabaseItem ConflictSproc Trigger UDF
  • 14. S Y S T E M TO P O LO G Y ( B E H I N D T H E S C E N E S ) Resource Manager Language Runtime(s) Hosts Query Processor RSM Index Manager Bw-tree++/ LLAMA++ Log Manager IO Manager Resource Governor Transport Database engine Admission control … … Planet Earth Azure regions Datacenters Stamps Fault domains Cluster Machine Replica Database engine Container Various agents
  • 15. R E S O U R C E H I E R A R C H Y CONTAINERS Logical resources “surfaced” to APIs as tables, collections or graphs, which are made up of one or more physical partitions or servers. RESOURCE PARTITIONS • Consistent, highly available, and resource-governed coordination primitives • Consist of replica sets, with each replica hosting an instance of the database engine Containers Resource Partitions CollectionsTables Graphs Tenants Leader Follower Follower Forwarder Replica Set To remote resource partition(s)
  • 17. R E Q U E S T U N I T S Request Units (RUs) is a rate-based currency Abstracts physical resources for performing requests Key to multi-tenancy, SLAs, and COGS efficiency Foreground and background activities % IOPS% CPU% Memory
  • 18. R E Q U E S T U N I T S Normalized across various access methods 1 read of 1 KB document from a single partition Each request consumes fixed RUs Applies to reads, writes, query, and stored procedures GET POST PUT Query … = = = =
  • 19. R E Q U E S T U N I T S Provisioned in terms of RU/sec Rate limiting based on amount of throughput provisioned Can be increased or decreased instantaneously Metered Hourly Background processes like TTL expiration, index transformations scheduled when quiescent Min RU/sec Max RU/sec IncomingRequests Replica Quiescent Rate limit No rate limiting
  • 20. * N E W * P R O V I S I O N R U / S F O R A S E T O F C O N TA I N E R S Remove friction for OSS NoSQL APIs Provision RU/sec shared across containers Mix containers with dedicated throughput and containers with shared throughput Elastically scale provisioned throughput for a set of containers at any time
  • 22. E L A S T I C S C A L E O U T O F S TO R A G E A N D T H R O U G H P U T SCALES AS YOUR APPS’ NEEDS CHANGE Database elastically scales storage and throughput How? Scale-out! Collections can span across large clusters of machines Can start small and seamlessly grow as your app grows
  • 23. E L A S T I C S C A L E O U T O F S TO R A G E A N D T H R O U G H P U T SCALES AS YOUR APPS’ NEEDS CHANGE Database elastically scales storage and throughput How? Scale-out! Collections can span across large clusters of machines Can start small and seamlessly grow as your app grows
  • 24. PA R T I T I O N S Cosmos DB Container (e.g. Collection) Partition Key: User ID Logical Partitioning Abstraction Behind the Scenes: Physical Partition Sets hash(User ID) Psuedo-random distribution of data over range of possible hashed values
  • 25. PA R T I T I O N S … Partition 1 Partition 2 Partition n Frugal # of Partitions based on actual storage and throughput needs (yielding scalability with low total cost of ownership) hash(User ID) Pseudo-random distribution of data over range of possible hashed values Andrew Mike … Bob Dharma Shireesh Karthik Rimma Alice Carol …
  • 26. PA R T I T I O N S … Partition 1 Partition 2 Partition n What happens when partitions need to grow? hash(User ID) Pseudo-random distribution of data over range of possible hashed values Andrew Mike … Bob Dharma Shireesh Karthik Rimma Alice Carol …
  • 27. PA R T I T I O N S Partition Ranges can be dynamically sub-divided to seamlessly grow database as the application grows while simultaneously maintaining high availability. Partition management is fully managed by Azure Cosmos DB, so you don't have to write code or manage your partitions. + Partition x Partition x1 Partition x2 hash(User ID) Pseudo-random distribution of data over range of possible hashed values Rimma Karthik … Dharma Shireesh Karthik Rimma Alice Carol … Dharma Shireesh …
  • 28. PA R T I T I O N S Best Practices: Design Goals for Choosing a Good Partition Key • Distribute the overall request + storage volume • Avoid “hot” partition keys Steps for Success • Ballpark scale needs (size/throughput) • Understand the workload • # of reads/sec vs writes per sec • Use pareto principal (80/20 rule) to help optimize bulk of workload • For reads – understand top 3-5 queries (look for common filters) • For writes – understand transactional needs General Tips • Build a POC to strengthen your understanding of the workload and iterate (avoid analyses paralysis) • Don’t be afraid of having too many partition keys • Partitions keys are logical • More partition keys  more scalability • Partition Key is scope for multi-record transactions and routing queries • Queries can be intelligently routed via partition key • Omitting partition key on query requires fan-out
  • 29. * N E W * B U L K E X E C U TO R L I B R A R Y Easy out-of-the-box bulk operation functionality Supports bulk import and update Auto handles congestion control + transient errors 10x client-side performance improvement Easily scale-out clients across more VMs Available starting with .NET and Java
  • 32. T U R N K E Y G LO B A L D I S T R I B U T I O N High Availability • Automatic and Manual Failover • Multi-homing API removes need for app redeployment Low Latency (anywhere in the world) • Packets cannot move fast than the speed of light • Sending a packet across the world under ideal network conditions takes 100’s of milliseconds • You can cheat the speed of light – using data locality • CDN’s solved this for static content • Azure Cosmos DB solves this for dynamic content
  • 33. T U R N K E Y G LO B A L D I S T R I B U T I O N • Automatic and transparent replication worldwide • Each partition hosts a replica set per region • Customers can test end to end application availability by programmatically simulating failovers • All regions are hidden behind a single global URI with multi-homing capabilities • Customers can dynamically add / remove additional regions at any time Writes/ Reads Reads "airport" : “AMS" "airport" : “MEL" West US Container "airport" : "LAX" Local Distribution (via horizontal partitioning) GlobalDistribution(ofresourcepartitions) Reads 30K transactions/sec Writes/ Reads Reads Reads West Europe 30K transactions/sec Partition-key = "airport"
  • 34. R E P L I C AT I N G D ATA G LO B A L LY
  • 35. R E P L I C AT I N G D ATA G LO B A L LY
  • 36. R E P L I C AT I N G D ATA G LO B A L LY
  • 37. A U TO M AT I C FA I LO V E R
  • 38. A U TO M AT I C FA I LO V E R
  • 39. M A N U A L FA I LO V E R
  • 40. Strong Bounded-staleness Session Consistent prefix Eventual F I V E W E L L - D E F I N E D C O N S I S T E N C Y M O D E L S CHOOSE THE BEST CONSISTENCY MODEL FOR YOUR APP Five well-defined, consistency models Overridable on a per-request basis Provides control over performance-consistency tradeoffs, backed by comprehensive SLAs. An intuitive programming model offering low latency and high availability for your planet-scale app. CLEAR TRADEOFFS • Latency • Availability • Throughput
  • 41. * N E W * M U LT I - M A S T E R ( P R E V I E W ) Perfect for Intelligent Cloud and Intelligent Edge Applications Write scalability around the world Low latency writes around the world 99.999% High Availability around the world Well-defined consistency models Comprehensive conflict management
  • 43. H A N D L E A N Y D ATA W I T H N O S C H E M A O R I N D E X I N G R E Q U I R E D Azure Cosmos DB’s schema-less service automatically indexes all your data, regardless of the data model, to delivery blazing fast queries. Item Color Microwave safe Liquid capacity CPU Memory Storage Geek mug Graphite Yes 16ox ??? ??? ??? Coffee Bean mug Tan No 12oz ??? ??? ??? Surface book Gray ??? ??? 3.4 GHz Intel Skylake Core i7- 6600U 16GB 1 TB SSD • Automatic index management • Synchronous auto-indexing • No schemas or secondary indices needed • Works across every data model GEEK
  • 44. I N D E X I N G J S O N D O C U M E N T S { "locations": [ { "country": "Germany", "city": "Berlin" }, { "country": "France", "city": "Paris" } ], "headquarter": "Belgium", "exports": [ { "city": "Moscow" }, { "city": "Athens" } ] } locations headquarter exports 0 country city Germany Berlin 1 country city France Paris 0 1 city Athens city Moscow Belgium
  • 45. I N D E X I N G J S O N D O C U M E N T S { "locations": [ { "country": "Germany", "city": "Bonn", "revenue": 200 } ], "headquarter": "Italy", "exports": [ { "city": "Berlin", "dealers": [ { "name": "Hans" } ] }, { "city": "Athens" } ] } locations headquarter exports 0 country city Germany Bonn revenue 200 0 1 citycity Berlin Italy dealers 0 name Hans
  • 46. I N D E X I N G J S O N D O C U M E N T S Athens locations headquarter exports 0 country city Germany Bonn revenue 200 0 1 citycity Berlin Italy dealers 0 name Hans locations headquarter exports 0 country city Germany Berlin 1 country city France Paris 0 1 city Athens city Moscow Belgium
  • 47. I N V E R T E D I N D E X locations headquarter exports 0 country city Germany Berlin revenue 200 0 1 city Athens city Berlin Italy dealers 0 name Hans Bonn 1 country city France Paris Belgium Moscow
  • 48. I N D E X P O L I C I E S CUSTOM INDEXING POLICIES Though all Azure Cosmos DB data is indexed by default, you can specify a custom indexing policy for your collections. Custom indexing policies allow you to design and customize the shape of your index while maintaining schema flexibility. • Define trade-offs between storage, write and query performance, and query consistency • Include or exclude documents and paths to and from the index • Configure various index types { "automatic": true, "indexingMode": "Consistent", "includedPaths": [{ "path": "/*", "indexes": [{ "kind": "Hash", "dataType": "String", "precision": -1 }, { "kind": "Range", "dataType": "Number", "precision": -1 }, { "kind": "Spatial", "dataType": "Point" }] }], "excludedPaths": [{ "path": "/nonIndexedContent/*" }] }
  • 50. P R O V I S I O N T H R O U G H P U T F O R A S E T O F C O N TA I N E R S Remove friction for OSS NoSQL APIs Provision RU/sec shared across containers Mix containers with dedicated throughput and containers with shared throughput Elastically scale provisioned throughput for a set of containers at any time
  • 51. B U L K E X E C U TO R L I B R A R Y Easy out-of-the-box bulk operation functionality Supports bulk import and update Auto handles congestion control + transient errors 10x client-side performance improvement Easily scale-out clients across more VMs Available starting with .NET and Java
  • 52. M U LT I - M A S T E R @ G LO B A L S C A L E ( P R E V I E W ) Perfect for Intelligent Cloud and Intelligent Edge Applications Write scalability around the world Low latency writes around the world 99.999% High Availability around the world Well-defined consistency models Comprehensive conflict management
  • 53. V N E T S E R V I C E E N D P O I N T Secure communication without exposing public endpoints Limit access to specific VNET(s) subnet(s) Compatible with IP Firewall ACLs Available in all Azure regions
  • 54. J AVA A S Y N C L I B R A R Y F O R S Q L A P I New Async API surface for event-based programs w/ observable sequencies Leverages popular RxJava library 2x client-side performance improvement Improved user experience
  • 55. R E C A P Azure Cosmos DB Core Concepts and What’s New @ //Build/ 2018 TL;DR High-Level Overview Resource Model Request Units Partitioning Replication Automatic Indexing New Goodies Q&A