SlideShare a Scribd company logo
Modeling data and best practices
for the Azure Cosmos DB
Mohammad Asif Waquar
@asifwaquar
2
about me
Senior Software Engineer at ABN AMRO
https://www.linkedin.com/in/mohammad-asif-6a6153111/
SQL PASS Chapter Team
@arrnagaraj
@Sachit_Keshari
@SanjivVenkatram
@sarbjitgill
@aaroh_bits
@Pioisms
Agenda
Intro Cosmos DB
Resource Model
Data Modelling Strategy & Partitioning
Demo SQL API
Turnkey global distribution
Elastic scale out
of storage & throughput
Comprehensive SLAs
Guaranteed low latency at the 99th percentile
Five well-defined consistency models
Azure Cosmos DB
A globally distributed, massively scalable, multi-model database service
Turnkey global distribution
Elastic scale out
of storage & throughput
Comprehensive SLAs
Guaranteed low latency at the 99th percentile
Five well-defined consistency models
Azure Cosmos DB
A globally distributed, massively scalable, multi-model database service
Column-family Document
Graph
Key-value
Column-family Document
Graph
Turnkey global distribution
Elastic scale out
of storage & throughput
Comprehensive SLAs
Guaranteed low latency at the 99th percentile
Five well-defined consistency models
TableAPI
Key-value
Cosmos DB’s API for
MongoDB
Azure Cosmos DB
A globally distributed, massively scalable, multi-model database service
Features
• Multi-model data paradigm: key-value, document, graph, family of columns;
• Low latency for 99% of queries: less than 10 ms for read operations and less than 15 ms for
(indexed) write operations;
• Designed for high throughput;
• Ensures availability, consistency of data, delay at SLA level of 99.999%;
• Configurable throughput;
• Automatic replication (master-slave);
• Automatic data indexing;
• Configurable levels of consistency of data. Five different levels (Strong, Bounded Staleness,
Session, Consistent Prefix, Eventual);
HOW’S THE
THROUGHPUT ?
Resource Model
CONTAINERS
Logical resources “surfaced” to APIs as tables,
collections or graphs, which are made up of one or
more physical partitions or servers.
Containers
Resource Partitions
CollectionsTables Graphs
Tenants
Follower
Follower
Leader
Forwarder
Replica Set
RESOURCE PARTITIONS
• Consistent, highly available, and resource-governed
coordination primitives
• Consist of replica sets, with each replica hosting an
instance of the database engine
To remote resource partition(s)
Resource Hierarchy
Account
DatabaseDatabaseDatabase
DatabaseDatabaseContainer
DatabaseDatabaseItem
Account URI and Credentials
********.azure.com
pass…
Account
DatabaseDatabaseDatabase
DatabaseDatabaseContainer
DatabaseDatabaseItem
Creating Account
Account
DatabaseDatabaseDatabase
DatabaseDatabaseContainer
DatabaseDatabaseItem
Database Representations
Account
DatabaseDatabaseDatabase
DatabaseDatabaseContainer
DatabaseDatabaseItem
= Collection Graph Table
Container Representations
Account
DatabaseDatabaseDatabase
DatabaseDatabaseContainer
DatabaseDatabase
Item Document Vertices/Edges Row
Collection Graph Table
Item Representations
Account
DatabaseDatabaseDatabase
DatabaseDatabaseContainer
DatabaseDatabaseItem Conflict
Stored
procedure
Trigger UDF
Container-Level Resources
Data Modelling Strategy & Partitioning
Ways to Model Your Data
Normalize everything
Embed as 1 piece
Data Modelling: Relational vs. Document
UserID Name Dob
1 John Smith 8/30/1964
StockID UserID Qty Symbol
1 1 100 MSFT
2 1 75 WMT
Document
{
"id": 1,
"name": "John Smith",
"dob": "1964-30-08",
"holdings": [
{ "qty": 100, "symbol": "MSFT" },
{ "qty": 75, "symbol": "WMT" }
]
}
User Table
Holdings Table
Relational Store Document Store
Rows Documents
Columns Properties
Strongly-typed schemas Schema-free
Highly normalized Typically denormalized
Modelling challenges
• How to de-normalize ?
• How to normalize ?
• To embed or reference ?
• Can I apply joins ?
• Should I put data types in same collection ,or different ?
Modelling challenges: To embed or reference ?
Document
"id": 1,
"name": "John Smith",
"dob": "1964-30-08",
"holdings": [
{ "qty": 100, "symbol": "MSFT" },
{ "qty": 75, "symbol": "WMT" }
]
Document
{
"postid": "1",
"title": "My blog post",
"body": "Post content…",
"comments": [
"comment #1",
"comment #2",
"comment #3",
"comment #4“,
:
"comment #1598873",
:
Embed
Reference
Document
{
"postid": "1",
"title": "My blog post",
"body": "Post content…“
}
Document
Document
{ Document{
}
}
{
"postid": "1",
"comment": "comment #3“
}
When to embed ?
o Data that is queried together, should live together.
o Child data is dependent on parent.
o 1:1 relationship eg. All customer have email, phone, nric number for
1:1 relationship.
o Data doesn’t change that frequently eg. Email ,address don’t change too often.
o Usually embedding provides better read performance but trade-off for write performance,
So if we aren’t doing more write this approach will be good.
When to reference ?
o 1 : many (unbounded relationship)
o many : many relationships
o Data changes at different rates
o What is referenced, is heavily referenced by many others
o Typically provides better write performance
o But may require more network calls for reads
Why is choice of partition key so important?
o Enables your data in Cosmos DB to scale
o Large impact on performance of system
What can go wrong?
o Hot partitions
o Choice forces many cross-partition queries for workload
Partitioning
Logical partition: Stores all data associated with the same partition key value
Physical partition: Fixed amount of reserved SSD-backed storage + compute.
Cosmos DB distributes logical partitions among a smaller number of physical partitions.
From your perspective: define 1 partition key per container
Partitioning
Partition Key: User Id
Logical Partitioning Abstraction
Behind the Scenes:
Physical Partition Sets
hash(User Id)
Psuedo-random distribution of data over
range of possible hashed values
Cosmos DB Container (e.g. Collection)
hash(User Id)
….
Melvin
karen
…
Physical
Partition 1
Physical
Partition 2
Physical
Partition n
John
Dharma
Shireesh
Nilesh
Sukhi
Bob
Milton
…
Frugal # of Partitions based on actual storage and throughput needs
(yielding scalability with low total cost of ownership)
Range 1 Range 2 Range n
Physical Partition Sets
hash(User Id)
….
Melvin
Karen
…
Physical
Partition 1
Physical
Partition 2
Physical
Partition n
John
Dharma
Shireesh
Nilesh
Sukhi
Bob
Milton
…
What happens when partitions need to grow?
Range 1 Range 2 Range n
Physical Partition Sets
hash(User Id)
Partition X
Dharma
Shireesh
Nilesh
Sukhi
Bob
Milton
…
+
Dharma
Shireesh
…
Partition X1
Nilesh
Sukhi
…
Partition X2
Partition Ranges can be dynamically sub-divided
To seamlessly grow database as the application grows
While sedulously maintaining high availability
Range 1 Range 2 Range X1 Range X2
Range X
Physical Partition Sets
hash(User Id)
Partition Ranges can be dynamically sub-divided
To seamlessly grow database as the application grows
While sedulously maintaining high availability
Best of All:
Partition management is completely taken care of by the system
You don’t have to lift a finger… the database takes care of you.
Partition X
Dharma
Shireesh
Nilesh
Sukhi
Bob
Milton
…
+
Dharma
Shireesh
…
Partition X1
Nilesh
Sukhi
…
Partition X2
Range 1 Range 2 Range X1 Range X2
Physical Partition Sets
Replication and Consistency
How do you ensure consistent reads across replicas?
- Define a consistency level
Replication within aregion
- Data moves extremely fast (typically, within1ms)between neighboring
racks
Global replication
- Ittakeshundreds of milliseconds to move data across continents
Strongerconsistency
Higherlatency
Loweravailability
Weakerconsistency
Lower latency Higher
availability
Replication and Consistency
Consistency Level Guarantees
Strong Linearizability (once operation is complete, it will be visible to all), No dirty reads
Bounded Staleness Consistent Prefix.
Reads lag behind writes by at most k prefixes or t interval (Dirty reads possible Bounded by
time and updates.)
Similar properties to strong consistency (except within staleness window), while preserving 99.99%
availability and low latency.
Session Consistent Prefix.
Within a session: Predictable consistency for a session, high read throughput + low latency
No dirty reads for writers (read your own writes),Dirty reads possible for other users
Consistent Prefix Reads will never see out of order writes (no gaps).
Eventual Potential for out of order reads. Lowest cost for reads of all consistency levels.
Well-Defined Consistency Models
Let’s see in action
Application uses
Important Links
https://azure.microsoft.com/en-us/pricing/calculator/?service=cosmos-db#cosmos-db7aed2059-b457-48cc-
a0e9-6744ce81096b
Pricing Calculator
https://docs.microsoft.com/en-us/azure/cosmos-db/sql-query-getting-started
Azure Cosmos Emulator
https://docs.microsoft.com/en-us/azure/cosmos-db/local-emulator#controlling-the-emulator
SQL API Query
http://www.microsoft.com/en-us/download/details.aspx?id=46436
Data Migration Tool
Questions?
Thank you

More Related Content

What's hot (20)

PPTX
Azure Migration Program Pitch Deck
Nicholas Vossburg
 
PPTX
Azure Cloud PPT
Aniket Kanitkar
 
PPTX
AZURE Data Related Services
Ruslan Drahomeretskyy
 
PDF
Azure Data Factory V2; The Data Flows
Thomas Sykes
 
PPTX
Azure Logic Apps
BizTalk360
 
PPTX
Introduction To Microservices
Lalit Kale
 
PDF
Introduction to elasticsearch
hypto
 
PPTX
Microsoft Azure Logic apps
CloudFronts Technologies LLP.
 
PPTX
Migrating Data and Databases to Azure
Karen Lopez
 
PPTX
Dynatrace
Purnima Kurella
 
PDF
How to Set Up a Cloud Cost Optimization Process for your Enterprise
RightScale
 
PPTX
Azure data platform overview
James Serra
 
PDF
Kafka Streams: What it is, and how to use it?
confluent
 
PDF
Microsoft Azure Overview
David J Rosenthal
 
PDF
AWS Certified Solutions Architect Associate Notes.pdf
fayoyiwababajide
 
PPTX
Retail referencearchitecture productcatalog
MongoDB
 
PPTX
Azure data factory
David Giard
 
PDF
Introdution to Dataops and AIOps (or MLOps)
Adrien Blind
 
PDF
Spark SQL
Joud Khattab
 
PPTX
Microsoft Azure Databricks
Sascha Dittmann
 
Azure Migration Program Pitch Deck
Nicholas Vossburg
 
Azure Cloud PPT
Aniket Kanitkar
 
AZURE Data Related Services
Ruslan Drahomeretskyy
 
Azure Data Factory V2; The Data Flows
Thomas Sykes
 
Azure Logic Apps
BizTalk360
 
Introduction To Microservices
Lalit Kale
 
Introduction to elasticsearch
hypto
 
Microsoft Azure Logic apps
CloudFronts Technologies LLP.
 
Migrating Data and Databases to Azure
Karen Lopez
 
Dynatrace
Purnima Kurella
 
How to Set Up a Cloud Cost Optimization Process for your Enterprise
RightScale
 
Azure data platform overview
James Serra
 
Kafka Streams: What it is, and how to use it?
confluent
 
Microsoft Azure Overview
David J Rosenthal
 
AWS Certified Solutions Architect Associate Notes.pdf
fayoyiwababajide
 
Retail referencearchitecture productcatalog
MongoDB
 
Azure data factory
David Giard
 
Introdution to Dataops and AIOps (or MLOps)
Adrien Blind
 
Spark SQL
Joud Khattab
 
Microsoft Azure Databricks
Sascha Dittmann
 

Similar to Modeling data and best practices for the Azure Cosmos DB. (20)

PPTX
Tech-Spark: Exploring the Cosmos DB
Ralph Attard
 
PPTX
Azure CosmosDb
Marco Parenzan
 
PPTX
Application architecture for the rest of us - php xperts devcon 2012
M N Islam Shihan
 
PPTX
Azure Cosmos DB + Gremlin API in Action
Denys Chamberland
 
PPTX
Technical overview of Azure Cosmos DB
Microsoft Tech Community
 
ODP
Nosql availability & integrity
Fahri Firdausillah
 
PDF
Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...
Andre Essing
 
PDF
NoSQL Basics - A Quick Tour
Bikram Sinha. MBA, PMP
 
PPT
Handling Data in Mega Scale Web Systems
Vineet Gupta
 
PDF
Azure Cosmos DB - Technical Deep Dive
Andre Essing
 
PDF
Prague data management meetup 2018-03-27
Martin Bém
 
PDF
Zero to 60 with Azure Cosmos DB
Adnan Hashmi
 
PDF
Azure Cosmos DB - The Swiss Army NoSQL Cloud Database
BizTalk360
 
PPT
I/O & virtualization performance with a search engine based on an xml databa...
lucenerevolution
 
PPTX
Azure CosmosDb - Where we are
Marco Parenzan
 
PDF
Couchbase - Yet Another Introduction
Kelum Senanayake
 
PDF
Samedi SQL Québec - La plateforme data de Azure
MSDEVMTL
 
PPTX
Azure CosmosDB the new frontier of big data and nosql
Riccardo Cappello
 
PPTX
NoSQL Introduction, Theory, Implementations
Firat Atagun
 
PDF
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
Jürgen Ambrosi
 
Tech-Spark: Exploring the Cosmos DB
Ralph Attard
 
Azure CosmosDb
Marco Parenzan
 
Application architecture for the rest of us - php xperts devcon 2012
M N Islam Shihan
 
Azure Cosmos DB + Gremlin API in Action
Denys Chamberland
 
Technical overview of Azure Cosmos DB
Microsoft Tech Community
 
Nosql availability & integrity
Fahri Firdausillah
 
Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...
Andre Essing
 
NoSQL Basics - A Quick Tour
Bikram Sinha. MBA, PMP
 
Handling Data in Mega Scale Web Systems
Vineet Gupta
 
Azure Cosmos DB - Technical Deep Dive
Andre Essing
 
Prague data management meetup 2018-03-27
Martin Bém
 
Zero to 60 with Azure Cosmos DB
Adnan Hashmi
 
Azure Cosmos DB - The Swiss Army NoSQL Cloud Database
BizTalk360
 
I/O & virtualization performance with a search engine based on an xml databa...
lucenerevolution
 
Azure CosmosDb - Where we are
Marco Parenzan
 
Couchbase - Yet Another Introduction
Kelum Senanayake
 
Samedi SQL Québec - La plateforme data de Azure
MSDEVMTL
 
Azure CosmosDB the new frontier of big data and nosql
Riccardo Cappello
 
NoSQL Introduction, Theory, Implementations
Firat Atagun
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
Jürgen Ambrosi
 
Ad

More from Mohammad Asif (8)

PDF
Integrate Apps using Azure Workbench and Azure Blockchain as Service
Mohammad Asif
 
PDF
Build Blockchain Prototype using Azure Workbench and Manage data on ledger
Mohammad Asif
 
PPTX
Globally Distributed Modern Apps using Azure Cosmos DB and Azure Functions
Mohammad Asif
 
PPTX
Building Blockchain Application with Corda
Mohammad Asif
 
PPTX
Blockchin Architecture on Azure-Part-3
Mohammad Asif
 
PDF
Blockchin architecture & use cases -part-2
Mohammad Asif
 
PDF
Blockchin architecture azure meetup
Mohammad Asif
 
PDF
SQL Pass Chapter
Mohammad Asif
 
Integrate Apps using Azure Workbench and Azure Blockchain as Service
Mohammad Asif
 
Build Blockchain Prototype using Azure Workbench and Manage data on ledger
Mohammad Asif
 
Globally Distributed Modern Apps using Azure Cosmos DB and Azure Functions
Mohammad Asif
 
Building Blockchain Application with Corda
Mohammad Asif
 
Blockchin Architecture on Azure-Part-3
Mohammad Asif
 
Blockchin architecture & use cases -part-2
Mohammad Asif
 
Blockchin architecture azure meetup
Mohammad Asif
 
SQL Pass Chapter
Mohammad Asif
 
Ad

Recently uploaded (20)

PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Digital Circuits, important subject in CS
contactparinay1
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 

Modeling data and best practices for the Azure Cosmos DB.