SlideShare a Scribd company logo
Building a Hybrid Data
Cluster with MongoDB
and Postgres
A solution based on PostgreSQL’s Foreign Data Wrapper
27 April 2015
Context and
Customer scenario
Customer Requirements for Hybrid Cluster
- More and more unstructured data being generated
- Increasing use and requirements of noSQL databases –
because of
- usage scenario
- ability to scale horizontally
- Challenges
- A lot of Admin and Developer still prefer SQL as easy and
intutive tool to query information out of available data
- Not many noSQL databases support complex queries as SQL
does e.g. JOINs, Sub-query etc
3
Real Life Use Cases
- noSQL as Archive store of RDBMS
- RDBMS being used to store the operational and transactional data
- while noSQL may act as an archive store for historical data
- noSQL for receiving write stream
- noSQL databases being used to accumulate data from various sources
with high write throughput across multiple shards
- while RDBMS is used to store the filtered data after it has been
transformed into proper structures
- RDBMS makes it easier for the users to query data using SQLs and
JOINs
4
Hybrid Data Cluster is
the ‘need of hour’
- Most Advanced Open Source Database
- Supports Relational model of storing database
- Supports ACID features of Transactions
- Multi Version Concurrency Control
- Write Ahead WAL files
- Scalability with Tablespaces and Partitions/child tables
- Supports unstructured data-types (JSON, JSONB, HSTORE)
and full text search features
PostgreSQL
6
- Most popular noSQL Database for vast set of workloads
- Best for storing un-structured data
- Horizontal Scalability with sharding capability
- Provision for secondary indexes
- Aggregation and Map-reduce features
MongoDB
7
- Get the best out of both the worlds
- Based on SQL/MED – Management of External Data
- Allows you to create FOREIGN TABLES which maps to
external entities
- These entities could be
- Table in RDBMS
- collection in MongoDB
- Or can be mapped respective entities in HDFS or File System
- More about FDW in Postgres:
https://wiki.postgresql.org/wiki/Foreign_data_wrappers
Foreign Data Wrappers of PostgreSQL
8
FDW for MongoDB
- Started by CitusDB and then forked by EnterpriseDB
- More details - https://github.com/EnterpriseDB/mongo_fdw
- The example we will discuss here is based on a Blogpost
from EnterpriseDB -
http://www.enterprisedb.com/postgres-plus-edb-
blog/jason-davis/tales-trenches-new-mongodb-fdw
- Let’s go through the Demo
MongoDB FDW
10
Preparing the MongoDB
- Platform: Windows 7
- Create the directories that you will need
- cd d:mongodb
- mkdir a0
- mkdir b0
- mkdir c0
- mkdir c1
- mkdir c2
- mkdir d0
- mkdir d1
- mkdir d2
- mkdir cfg0
- mkdir cfg1
- mkdir cfg2
Prepare for a MongoDB Cluster
12
mongod --configsvr --dbpath d:mongodbcfg0 --port 26050 --install --logpath
d:mongodbcfg0.log --serviceName new_mongod_cfg0 --serviceDisplayName
new_mongod_cfg0
net start new_mongod_cfg0
mongod --configsvr --dbpath d:mongodbcfg1 --port 26051 --install --logpath
d:mongodbcfg1.log --serviceName new_mongod_cfg1 --serviceDisplayName
new_mongod_cfg1
net start new_mongod_cfg1
mongod --configsvr --dbpath d:mongodbcfg2 --port 26052 --install --logpath
d:mongodbcfg2.log --serviceName new_mongod_cfg2 --serviceDisplayName
new_mongod_cfg2
net start new_mongod_cfg2
Create the services for MongoDB Cluster: Config
Server
13
mongod --shardsvr --replSet a --dbpath d:mongodba0 --logpath d:mongodba0.log --
port 27000 --smallfiles --oplogSize 50 --install --serviceName new_mongod_shrd_a0 --
serviceDisplayName new_mongod_shrd_a0
net start new_mongod_shrd_a0
mongod --shardsvr --replSet b --dbpath d:mongodbb0 --logpath d:mongodbb0.log --
port 27100 --smallfiles --oplogSize 50 --install --serviceName new_mongod_shrd_b0 --
serviceDisplayName new_mongod_shrd_b0
net start new_mongod_shrd_b0
mongod --shardsvr --replSet c --dbpath d:mongodbc0 --logpath d:mongodbc0.log --
port 27200 --smallfiles --oplogSize 50 --install --serviceName new_mongod_shrd_c0 --
serviceDisplayName new_mongod_shrd_c0
net start new_mongod_shrd_c0
Create the services for MongoDB Cluster: Create
Shards
14
- Though here for simplicity we have skipped the creation of
replica set you can do that
- e.g.
- mkdir a1
- mongod --shardsvr --replSet a --dbpath d:mongodba1 --logpath
d:mongodba0.log --port 27001 --smallfiles --oplogSize 50 --install
--serviceName new_mongod_shrd_a1 --serviceDisplayName
new_mongod_shrd_a1
- net start new_mongod_shrd_a1
Create the services for MongoDB Cluster:
Optionally Create the Replicas
15
- mongos --configdb
sameer:26050,sameer:26051,sameer:26052 --install --
serviceName new_mongos_svc0 --serviceDisplayName
new_mongos_svc0 --logpath d:mongodbmongos0.log --port
26060
- net start new_mongos_svc0
Initiate the Mongos
16
- I am going to initiate 1 member replica set for all my shards
Initiate the Replica Set
17
- Shard A
mongo --port 27000
> rs.initiate()
a:OTHER> rs.conf()
a:PRIMARY> exit
- Shard B
mongo --port 27100
> rs.initiate()
b:OTHER> rs.conf()
b:PRIMARY> exit
- Shard C
mongo --port 27200
> rs.initiate()
c:OTHER> rs.conf()
c:PRIMARY> exit
mongo --port 26060 test
mongos> sh.addShard("sameer:27100")
mongos> sh.addShard("sameer:27200")
mongos> sh.addShard("sameer:27000")
mongos> sh.enableSharding("db")
mongos>
sh.shardCollection("db.warehouse",{warehouse_created:1},true)
Setup Sharding
18
mongos> use db
mongos> db.createUser(
... {
... user: "superuser",
... pwd: "password",
... roles: [ { role: "root", db: "admin" } ]
... }
... )
Setup Users and Security
19
Creating FDW Extension
in Postgres
- Download MongoDB FDW from Github
- Installation is quite easy when you use autogen.sh
- Cd $PATH_WHERE_FDW_IS_EXTRACTED
- ./autogen.sh
- It will automatically install all the required components
- libbson
- libmongoc
- Once installation is done then you can make and install
- make -f Makefile.meta && make -f Makefile.meta install
Build MongoDB FDW
21
- Allows you to build with Legacy Driver or Master Branch
- Has read and write capability for the foreign table
- Connection Pooling which uses the same MongoDB
connection for queries in same session
- Build with MongoDB's legacy branch driver
- autogen.sh --with-legacy
- Build MongoDB's master branch driver
- autogen.sh --with-master
Features of mongo_fdw
22
- Create Extension for mongo_fdw in PostgreSQL database
- You may create the table in template database
- Create a Foreign Data Server
- Create a user mapping a MongoDB user in Postgres
- Create Foreign Table which maps to a MongoDB Collection
Using mongo_fdw
23
- psql=# CREATE EXTENSION mongo_fdw;
- psql=# CREATE SERVER mongo_server
FOREIGN DATA WRAPPER mongo_fdw
OPTIONS (address '192.168.160.1', port '26060');
- psql=# CREATE USER MAPPING FOR postgres
SERVER mongo_server
OPTIONS (username 'superuser',
password 'password');
Create Foreign Server: Example
24
- psql=# CREATE FOREIGN TABLE warehouse(
_id NAME,
warehouse_id int,
warehouse_name text,
warehouse_created timestamptz)
SERVER mongo_server
OPTIONS (database 'db', collection 'warehouse');
Create Foreign Table: Example
25
- It stores a unique Object ID
- By default if you skip this column MongoDB will insert a 12
Byte BSON Object ID
- While inserting data into MongoDB you may choose the
value of this field
- In mongo_fdw you have to define _id column with its data
type as “NAME”
- mongo_fdw will ignore the value inserted in _id column and
let MongoDB
‘_id’ column of MongoDB
26
- INSERT INTO warehouse values (0, 1, 'UPS', '2014-12-
12T07:12:10Z');
- INSERT INTO warehouse values (0, 2, 'EMS', '2013-12-
12T07:12:10Z');
- INSERT INTO warehouse values (0, 3, 'ASX', '2013-11-
12T07:12:10Z');
- UPDATE warehouse set warehouse_name = 'UPS_NEW'
where warehouse_id = 1;
DML on Foreign Tables
27
- Connect to MongoDB
- mongo --port 26060 --username superuser --password password
- Check the data in collection
- db.warehouse.find()
Operations on MongoDB
28
- You can run analyze on the foreign Table to collect statistics
- You can fire queries with “where” clause
- You may fire JOIN queries with other FOREIGN TABLE or
NATIVE PostgreSQL Tables
Operations in Postgres on Foreign Data
29
Live walkthrough of the
Hybrid Cluster
Leverage upon complex SQLs with Sharded MongoDB
Benefits of this Setup
- Build a sharded MongoDB cluster with SQL Interface
- Query MongoDB data using SQL
- Join MongoDB collections with each other or with tables in
Postgres
- Combine and process MongoDB data with data from other data
source with help of respective FDW e.g. Hadoop, Oracle, MySQL
etc
- Add more shards on the go
- Add Replica for MongoDB on the go
- Use Postgres as front end to insert/update/delete data in
MongoDB using SQL
31
Send us your suggestions and questions
success@ashnik.com
Stay Tuned!
Website: www.ashnik.com

More Related Content

PPTX
Integrating SSRS with SharePoint
SharePoint Saturday New Jersey
 
PPTX
Overview of microsoft teams
Vignesh Ganesan I Microsoft MVP
 
PDF
Scaling postgres
Denish Patel
 
PDF
PostgreSQL Scaling And Failover
John Paulett
 
PPTX
Tuning Slow Running SQLs in PostgreSQL
Ashnikbiz
 
PPTX
Transform your DBMS to drive engagement innovation with Big Data
Ashnikbiz
 
PDF
FOSSASIA 2016 - 7 Tips to design web centric high-performance applications
Ashnikbiz
 
PPTX
NGINX Plus PLATFORM For Flawless Application Delivery
Ashnikbiz
 
Integrating SSRS with SharePoint
SharePoint Saturday New Jersey
 
Overview of microsoft teams
Vignesh Ganesan I Microsoft MVP
 
Scaling postgres
Denish Patel
 
PostgreSQL Scaling And Failover
John Paulett
 
Tuning Slow Running SQLs in PostgreSQL
Ashnikbiz
 
Transform your DBMS to drive engagement innovation with Big Data
Ashnikbiz
 
FOSSASIA 2016 - 7 Tips to design web centric high-performance applications
Ashnikbiz
 
NGINX Plus PLATFORM For Flawless Application Delivery
Ashnikbiz
 

Viewers also liked (19)

PDF
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
Ashnikbiz
 
PDF
Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree
Ashnikbiz
 
PPTX
Streaming replication in PostgreSQL
Ashnikbiz
 
PPTX
PostgreSQL Hangout Parameter Tuning
Ashnikbiz
 
PPTX
X-DB Replication Server and MMR
Ashnikbiz
 
PPTX
PostgreSQL Hangout Replication Features v9.4
Ashnikbiz
 
PDF
2016 may-countdown-to-postgres-v96-parallel-query
Ashnikbiz
 
PPTX
Building Data Integration and Transformations using Pentaho
Ashnikbiz
 
PDF
Architecture for building scalable and highly available Postgres Cluster
Ashnikbiz
 
PPTX
The Magic of Tuning in PostgreSQL
Ashnikbiz
 
PDF
PgDay Asia 2016 - Security Best Practices for your Postgres Deployment
Ashnikbiz
 
PDF
PG-Strom - A FDW module utilizing GPU device
Kohei KaiGai
 
PDF
plProxy, pgBouncer, pgBalancer
elliando dias
 
PDF
Demystifying PostgreSQL
NOLOH LLC.
 
PPTX
Big Data Business Transformation - Big Picture and Blueprints
Ashnikbiz
 
PDF
Advanced Postgres Monitoring
Denish Patel
 
PDF
Postgres in Amazon RDS
Denish Patel
 
PDF
Performance improvements in PostgreSQL 9.5 and beyond
Tomas Vondra
 
ODP
PostgreSQL Replication in 10 Minutes - SCALE
PostgreSQL Experts, Inc.
 
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
Ashnikbiz
 
Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree
Ashnikbiz
 
Streaming replication in PostgreSQL
Ashnikbiz
 
PostgreSQL Hangout Parameter Tuning
Ashnikbiz
 
X-DB Replication Server and MMR
Ashnikbiz
 
PostgreSQL Hangout Replication Features v9.4
Ashnikbiz
 
2016 may-countdown-to-postgres-v96-parallel-query
Ashnikbiz
 
Building Data Integration and Transformations using Pentaho
Ashnikbiz
 
Architecture for building scalable and highly available Postgres Cluster
Ashnikbiz
 
The Magic of Tuning in PostgreSQL
Ashnikbiz
 
PgDay Asia 2016 - Security Best Practices for your Postgres Deployment
Ashnikbiz
 
PG-Strom - A FDW module utilizing GPU device
Kohei KaiGai
 
plProxy, pgBouncer, pgBalancer
elliando dias
 
Demystifying PostgreSQL
NOLOH LLC.
 
Big Data Business Transformation - Big Picture and Blueprints
Ashnikbiz
 
Advanced Postgres Monitoring
Denish Patel
 
Postgres in Amazon RDS
Denish Patel
 
Performance improvements in PostgreSQL 9.5 and beyond
Tomas Vondra
 
PostgreSQL Replication in 10 Minutes - SCALE
PostgreSQL Experts, Inc.
 
Ad

Similar to Building Hybrid data cluster using PostgreSQL and MongoDB (20)

PDF
MongoDB and DynamoDB
Md. Minhazul Haque
 
PPTX
171_74_216_Module_5-Non_relational_database_-mongodb.pptx
sukrithlal008
 
PDF
11 schema design & crud
Ahmed Elbassel
 
PPTX
MongoDB - Sharded Cluster Tutorial
Jason Terpko
 
PPTX
MongoDB – Sharded cluster tutorial - Percona Europe 2017
Antonios Giannopoulos
 
PPTX
Sharded cluster tutorial
Antonios Giannopoulos
 
PDF
Setting up mongodb sharded cluster in 30 minutes
Sudheer Kondla
 
PDF
Percona Live 2017 ­- Sharded cluster tutorial
Antonios Giannopoulos
 
PPT
Mongo db basics
Dhaval Mistry
 
PDF
Mongodb workshop
Harun Yardımcı
 
PPT
Mongo-Drupal
Forest Mars
 
PPTX
Get started with Microsoft SQL Polybase
Henk van der Valk
 
ODP
Introduction to MongoDB with PHP
fwso
 
PDF
MongoDB
wiTTyMinds1
 
PPTX
MongoDB basics & Introduction
Jerwin Roy
 
PPTX
introtomongodb
saikiran
 
ODP
This upload requires better support for ODP format
Forest Mars
 
PDF
Mongodb By Vipin
Vipin Mundayad
 
PDF
Nko workshop - node js & nosql
Simon Su
 
PDF
Python and MongoDB
Norberto Leite
 
MongoDB and DynamoDB
Md. Minhazul Haque
 
171_74_216_Module_5-Non_relational_database_-mongodb.pptx
sukrithlal008
 
11 schema design & crud
Ahmed Elbassel
 
MongoDB - Sharded Cluster Tutorial
Jason Terpko
 
MongoDB – Sharded cluster tutorial - Percona Europe 2017
Antonios Giannopoulos
 
Sharded cluster tutorial
Antonios Giannopoulos
 
Setting up mongodb sharded cluster in 30 minutes
Sudheer Kondla
 
Percona Live 2017 ­- Sharded cluster tutorial
Antonios Giannopoulos
 
Mongo db basics
Dhaval Mistry
 
Mongodb workshop
Harun Yardımcı
 
Mongo-Drupal
Forest Mars
 
Get started with Microsoft SQL Polybase
Henk van der Valk
 
Introduction to MongoDB with PHP
fwso
 
MongoDB
wiTTyMinds1
 
MongoDB basics & Introduction
Jerwin Roy
 
introtomongodb
saikiran
 
This upload requires better support for ODP format
Forest Mars
 
Mongodb By Vipin
Vipin Mundayad
 
Nko workshop - node js & nosql
Simon Su
 
Python and MongoDB
Norberto Leite
 
Ad

More from Ashnikbiz (20)

PPTX
CloudOps_tool.pptx
Ashnikbiz
 
PPTX
Webinar_CloudOps final.pptx
Ashnikbiz
 
PPTX
Autoscaling in Kubernetes (K8s)
Ashnikbiz
 
PPTX
Why and how to use Kubernetes for scaling of your multi-tier (n-tier) appli...
Ashnikbiz
 
PDF
Zero trust in a multi tenant environment
Ashnikbiz
 
PPTX
Deploy and automate ‘Secrets Management’ for a multi-cloud environment
Ashnikbiz
 
PPTX
Deploy, move and manage Postgres across cloud platforms
Ashnikbiz
 
PPTX
Deploy, move and manage Postgres across cloud platforms
Ashnikbiz
 
PPTX
The Best Approach For Multi-cloud Infrastructure Provisioning-2
Ashnikbiz
 
PPTX
The Best Approach For Multi-cloud Infrastructure Provisioning
Ashnikbiz
 
PPTX
Which PostgreSQL is right for your multi cloud strategy? P2
Ashnikbiz
 
PPTX
Which PostgreSQL is right for your multi cloud strategy? P1
Ashnikbiz
 
PPTX
Reduce the complexities of managing Kubernetes clusters anywhere 2
Ashnikbiz
 
PPTX
Reduce the complexities of managing Kubernetes clusters anywhere
Ashnikbiz
 
PPTX
Enhance your multi-cloud application performance using Redis Enterprise P2
Ashnikbiz
 
PPTX
Enhance your multi-cloud application performance using Redis Enterprise P1
Ashnikbiz
 
PPTX
Gain multi-cloud versatility with software load balancing designed for cloud-...
Ashnikbiz
 
PPTX
Gain multi-cloud versatility with software load balancing designed for cloud-...
Ashnikbiz
 
PPTX
Enterprise-class security with PostgreSQL - 1
Ashnikbiz
 
PPTX
Enterprise-class security with PostgreSQL - 2
Ashnikbiz
 
CloudOps_tool.pptx
Ashnikbiz
 
Webinar_CloudOps final.pptx
Ashnikbiz
 
Autoscaling in Kubernetes (K8s)
Ashnikbiz
 
Why and how to use Kubernetes for scaling of your multi-tier (n-tier) appli...
Ashnikbiz
 
Zero trust in a multi tenant environment
Ashnikbiz
 
Deploy and automate ‘Secrets Management’ for a multi-cloud environment
Ashnikbiz
 
Deploy, move and manage Postgres across cloud platforms
Ashnikbiz
 
Deploy, move and manage Postgres across cloud platforms
Ashnikbiz
 
The Best Approach For Multi-cloud Infrastructure Provisioning-2
Ashnikbiz
 
The Best Approach For Multi-cloud Infrastructure Provisioning
Ashnikbiz
 
Which PostgreSQL is right for your multi cloud strategy? P2
Ashnikbiz
 
Which PostgreSQL is right for your multi cloud strategy? P1
Ashnikbiz
 
Reduce the complexities of managing Kubernetes clusters anywhere 2
Ashnikbiz
 
Reduce the complexities of managing Kubernetes clusters anywhere
Ashnikbiz
 
Enhance your multi-cloud application performance using Redis Enterprise P2
Ashnikbiz
 
Enhance your multi-cloud application performance using Redis Enterprise P1
Ashnikbiz
 
Gain multi-cloud versatility with software load balancing designed for cloud-...
Ashnikbiz
 
Gain multi-cloud versatility with software load balancing designed for cloud-...
Ashnikbiz
 
Enterprise-class security with PostgreSQL - 1
Ashnikbiz
 
Enterprise-class security with PostgreSQL - 2
Ashnikbiz
 

Recently uploaded (20)

PPTX
Presentation about variables and constant.pptx
kr2589474
 
PPTX
Role Of Python In Programing Language.pptx
jaykoshti048
 
PPTX
ASSIGNMENT_1[1][1][1][1][1] (1) variables.pptx
kr2589474
 
PDF
Teaching Reproducibility and Embracing Variability: From Floating-Point Exper...
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
PDF
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
PDF
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
PDF
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
PDF
On Software Engineers' Productivity - Beyond Misleading Metrics
Romén Rodríguez-Gil
 
PDF
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
PPTX
classification of computer and basic part of digital computer
ravisinghrajpurohit3
 
PPTX
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
PDF
Salesforce Implementation Services Provider.pdf
VALiNTRY360
 
PPTX
oapresentation.pptx
mehatdhavalrajubhai
 
PDF
Bandai Playdia The Book - David Glotz
BluePanther6
 
PPTX
Presentation about Database and Database Administrator
abhishekchauhan86963
 
PDF
Enhancing Healthcare RPM Platforms with Contextual AI Integration
Cadabra Studio
 
PDF
An Experience-Based Look at AI Lead Generation Pricing, Features & B2B Results
Thomas albart
 
PDF
Jenkins: An open-source automation server powering CI/CD Automation
SaikatBasu37
 
PPTX
AI-Ready Handoff: Auto-Summaries & Draft Emails from MQL to Slack in One Flow
bbedford2
 
PDF
49784907924775488180_LRN2959_Data_Pump_23ai.pdf
Abilash868456
 
Presentation about variables and constant.pptx
kr2589474
 
Role Of Python In Programing Language.pptx
jaykoshti048
 
ASSIGNMENT_1[1][1][1][1][1] (1) variables.pptx
kr2589474
 
Teaching Reproducibility and Embracing Variability: From Floating-Point Exper...
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
On Software Engineers' Productivity - Beyond Misleading Metrics
Romén Rodríguez-Gil
 
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
classification of computer and basic part of digital computer
ravisinghrajpurohit3
 
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
Salesforce Implementation Services Provider.pdf
VALiNTRY360
 
oapresentation.pptx
mehatdhavalrajubhai
 
Bandai Playdia The Book - David Glotz
BluePanther6
 
Presentation about Database and Database Administrator
abhishekchauhan86963
 
Enhancing Healthcare RPM Platforms with Contextual AI Integration
Cadabra Studio
 
An Experience-Based Look at AI Lead Generation Pricing, Features & B2B Results
Thomas albart
 
Jenkins: An open-source automation server powering CI/CD Automation
SaikatBasu37
 
AI-Ready Handoff: Auto-Summaries & Draft Emails from MQL to Slack in One Flow
bbedford2
 
49784907924775488180_LRN2959_Data_Pump_23ai.pdf
Abilash868456
 

Building Hybrid data cluster using PostgreSQL and MongoDB

  • 1. Building a Hybrid Data Cluster with MongoDB and Postgres A solution based on PostgreSQL’s Foreign Data Wrapper 27 April 2015
  • 3. Customer Requirements for Hybrid Cluster - More and more unstructured data being generated - Increasing use and requirements of noSQL databases – because of - usage scenario - ability to scale horizontally - Challenges - A lot of Admin and Developer still prefer SQL as easy and intutive tool to query information out of available data - Not many noSQL databases support complex queries as SQL does e.g. JOINs, Sub-query etc 3
  • 4. Real Life Use Cases - noSQL as Archive store of RDBMS - RDBMS being used to store the operational and transactional data - while noSQL may act as an archive store for historical data - noSQL for receiving write stream - noSQL databases being used to accumulate data from various sources with high write throughput across multiple shards - while RDBMS is used to store the filtered data after it has been transformed into proper structures - RDBMS makes it easier for the users to query data using SQLs and JOINs 4
  • 5. Hybrid Data Cluster is the ‘need of hour’
  • 6. - Most Advanced Open Source Database - Supports Relational model of storing database - Supports ACID features of Transactions - Multi Version Concurrency Control - Write Ahead WAL files - Scalability with Tablespaces and Partitions/child tables - Supports unstructured data-types (JSON, JSONB, HSTORE) and full text search features PostgreSQL 6
  • 7. - Most popular noSQL Database for vast set of workloads - Best for storing un-structured data - Horizontal Scalability with sharding capability - Provision for secondary indexes - Aggregation and Map-reduce features MongoDB 7
  • 8. - Get the best out of both the worlds - Based on SQL/MED – Management of External Data - Allows you to create FOREIGN TABLES which maps to external entities - These entities could be - Table in RDBMS - collection in MongoDB - Or can be mapped respective entities in HDFS or File System - More about FDW in Postgres: https://wiki.postgresql.org/wiki/Foreign_data_wrappers Foreign Data Wrappers of PostgreSQL 8
  • 10. - Started by CitusDB and then forked by EnterpriseDB - More details - https://github.com/EnterpriseDB/mongo_fdw - The example we will discuss here is based on a Blogpost from EnterpriseDB - http://www.enterprisedb.com/postgres-plus-edb- blog/jason-davis/tales-trenches-new-mongodb-fdw - Let’s go through the Demo MongoDB FDW 10
  • 12. - Platform: Windows 7 - Create the directories that you will need - cd d:mongodb - mkdir a0 - mkdir b0 - mkdir c0 - mkdir c1 - mkdir c2 - mkdir d0 - mkdir d1 - mkdir d2 - mkdir cfg0 - mkdir cfg1 - mkdir cfg2 Prepare for a MongoDB Cluster 12
  • 13. mongod --configsvr --dbpath d:mongodbcfg0 --port 26050 --install --logpath d:mongodbcfg0.log --serviceName new_mongod_cfg0 --serviceDisplayName new_mongod_cfg0 net start new_mongod_cfg0 mongod --configsvr --dbpath d:mongodbcfg1 --port 26051 --install --logpath d:mongodbcfg1.log --serviceName new_mongod_cfg1 --serviceDisplayName new_mongod_cfg1 net start new_mongod_cfg1 mongod --configsvr --dbpath d:mongodbcfg2 --port 26052 --install --logpath d:mongodbcfg2.log --serviceName new_mongod_cfg2 --serviceDisplayName new_mongod_cfg2 net start new_mongod_cfg2 Create the services for MongoDB Cluster: Config Server 13
  • 14. mongod --shardsvr --replSet a --dbpath d:mongodba0 --logpath d:mongodba0.log -- port 27000 --smallfiles --oplogSize 50 --install --serviceName new_mongod_shrd_a0 -- serviceDisplayName new_mongod_shrd_a0 net start new_mongod_shrd_a0 mongod --shardsvr --replSet b --dbpath d:mongodbb0 --logpath d:mongodbb0.log -- port 27100 --smallfiles --oplogSize 50 --install --serviceName new_mongod_shrd_b0 -- serviceDisplayName new_mongod_shrd_b0 net start new_mongod_shrd_b0 mongod --shardsvr --replSet c --dbpath d:mongodbc0 --logpath d:mongodbc0.log -- port 27200 --smallfiles --oplogSize 50 --install --serviceName new_mongod_shrd_c0 -- serviceDisplayName new_mongod_shrd_c0 net start new_mongod_shrd_c0 Create the services for MongoDB Cluster: Create Shards 14
  • 15. - Though here for simplicity we have skipped the creation of replica set you can do that - e.g. - mkdir a1 - mongod --shardsvr --replSet a --dbpath d:mongodba1 --logpath d:mongodba0.log --port 27001 --smallfiles --oplogSize 50 --install --serviceName new_mongod_shrd_a1 --serviceDisplayName new_mongod_shrd_a1 - net start new_mongod_shrd_a1 Create the services for MongoDB Cluster: Optionally Create the Replicas 15
  • 16. - mongos --configdb sameer:26050,sameer:26051,sameer:26052 --install -- serviceName new_mongos_svc0 --serviceDisplayName new_mongos_svc0 --logpath d:mongodbmongos0.log --port 26060 - net start new_mongos_svc0 Initiate the Mongos 16
  • 17. - I am going to initiate 1 member replica set for all my shards Initiate the Replica Set 17 - Shard A mongo --port 27000 > rs.initiate() a:OTHER> rs.conf() a:PRIMARY> exit - Shard B mongo --port 27100 > rs.initiate() b:OTHER> rs.conf() b:PRIMARY> exit - Shard C mongo --port 27200 > rs.initiate() c:OTHER> rs.conf() c:PRIMARY> exit
  • 18. mongo --port 26060 test mongos> sh.addShard("sameer:27100") mongos> sh.addShard("sameer:27200") mongos> sh.addShard("sameer:27000") mongos> sh.enableSharding("db") mongos> sh.shardCollection("db.warehouse",{warehouse_created:1},true) Setup Sharding 18
  • 19. mongos> use db mongos> db.createUser( ... { ... user: "superuser", ... pwd: "password", ... roles: [ { role: "root", db: "admin" } ] ... } ... ) Setup Users and Security 19
  • 21. - Download MongoDB FDW from Github - Installation is quite easy when you use autogen.sh - Cd $PATH_WHERE_FDW_IS_EXTRACTED - ./autogen.sh - It will automatically install all the required components - libbson - libmongoc - Once installation is done then you can make and install - make -f Makefile.meta && make -f Makefile.meta install Build MongoDB FDW 21
  • 22. - Allows you to build with Legacy Driver or Master Branch - Has read and write capability for the foreign table - Connection Pooling which uses the same MongoDB connection for queries in same session - Build with MongoDB's legacy branch driver - autogen.sh --with-legacy - Build MongoDB's master branch driver - autogen.sh --with-master Features of mongo_fdw 22
  • 23. - Create Extension for mongo_fdw in PostgreSQL database - You may create the table in template database - Create a Foreign Data Server - Create a user mapping a MongoDB user in Postgres - Create Foreign Table which maps to a MongoDB Collection Using mongo_fdw 23
  • 24. - psql=# CREATE EXTENSION mongo_fdw; - psql=# CREATE SERVER mongo_server FOREIGN DATA WRAPPER mongo_fdw OPTIONS (address '192.168.160.1', port '26060'); - psql=# CREATE USER MAPPING FOR postgres SERVER mongo_server OPTIONS (username 'superuser', password 'password'); Create Foreign Server: Example 24
  • 25. - psql=# CREATE FOREIGN TABLE warehouse( _id NAME, warehouse_id int, warehouse_name text, warehouse_created timestamptz) SERVER mongo_server OPTIONS (database 'db', collection 'warehouse'); Create Foreign Table: Example 25
  • 26. - It stores a unique Object ID - By default if you skip this column MongoDB will insert a 12 Byte BSON Object ID - While inserting data into MongoDB you may choose the value of this field - In mongo_fdw you have to define _id column with its data type as “NAME” - mongo_fdw will ignore the value inserted in _id column and let MongoDB ‘_id’ column of MongoDB 26
  • 27. - INSERT INTO warehouse values (0, 1, 'UPS', '2014-12- 12T07:12:10Z'); - INSERT INTO warehouse values (0, 2, 'EMS', '2013-12- 12T07:12:10Z'); - INSERT INTO warehouse values (0, 3, 'ASX', '2013-11- 12T07:12:10Z'); - UPDATE warehouse set warehouse_name = 'UPS_NEW' where warehouse_id = 1; DML on Foreign Tables 27
  • 28. - Connect to MongoDB - mongo --port 26060 --username superuser --password password - Check the data in collection - db.warehouse.find() Operations on MongoDB 28
  • 29. - You can run analyze on the foreign Table to collect statistics - You can fire queries with “where” clause - You may fire JOIN queries with other FOREIGN TABLE or NATIVE PostgreSQL Tables Operations in Postgres on Foreign Data 29
  • 30. Live walkthrough of the Hybrid Cluster Leverage upon complex SQLs with Sharded MongoDB
  • 31. Benefits of this Setup - Build a sharded MongoDB cluster with SQL Interface - Query MongoDB data using SQL - Join MongoDB collections with each other or with tables in Postgres - Combine and process MongoDB data with data from other data source with help of respective FDW e.g. Hadoop, Oracle, MySQL etc - Add more shards on the go - Add Replica for MongoDB on the go - Use Postgres as front end to insert/update/delete data in MongoDB using SQL 31
  • 32. Send us your suggestions and questions [email protected] Stay Tuned! Website: www.ashnik.com