SlideShare a Scribd company logo
noSQL vs SQL
Comparative usage of noSQL and SQL DBMS in enterprise applications
Contents
noSQL vs SQL
MongoDB in production
MongoDB + Oracle in production
Measuring performance
Querying: examples
Intro
A few words about me
• My name is Hanna Kaplun, I’m a QA Lead at Intellias
• Got more than 10 years of experience in testing
• and a few as a support specialist, technical writer, Oracle DB
developer, BA
• Here’s my LinkedIn
• Yes, testing is my conscious choice 
• Held a few trainings, meetups etc. for various audiences
• Wrote a number of articles about testing
• and data science using R
• Got myself a husband and two kids, 6 yrs and 1 yr
correspondingly 
noSQL vs SQL
SQL vs noSQL: main differences
• They are relational, i.e. they can be normalized to store the data
efficiently taking less space
• There is a pre-defined schema, the structure where the data is
stored (DDL portion of SQL)
• Data structures to store data are tables with fixed columns
• Vertical scaling (buy a larger server)
• Multi-record transactions are supported
SQL noSQL
• They are non-relational, therefore no storage optimization
• The schema is dynamic as the data are unstructured
• Data structures may vary
– Document (JSON, i.e. trees)
– Key-value
– Wide-column (tables with dynamic columns)
– Graph (nodes and edges, dict of lists as a rule)
• Horizontal scaling (buy more servers)
• Multi-record transactions are usually not supported
SQL vs noSQL: advantages and disadvantages
• Advantages
– Normalization is available to optimize data
storage
– Transactions to support ACID
– ANY data can be stored this way
• Disadvantages
– Joins and lots of complicated query stuff
– May be slow
– Additional interfaces to communicate with
DBMS from a programming language
SQL noSQL
• Advantages
– Fast
– Easy to understand and query as the data are
stored naturally and dynamically, no need to
restructure them
– Easy to query from a programming language
• Disadvantages
– Possible data duplication
– No transactions, so implementing ACID
principles may be a challenge (usually it is done
via a programming language, not via a DBMS)
SQL vs noSQL: DBMS examples
• Oracle
• MySQL
• MS SQL Server
• PostgreSQL
• SQLite
SQL noSQL
• Document-based
– MongoDB
– CouchDB
• Key-value based
– DynamoDB
– Redis
• Wide-column based:
– Cassandra
– HBase
• Graph based:
– Neo4j
– Amazon
Neptune
MongoDB in production
The application
• The general idea of the application was to sign the documents with electronic
signatures and store the signature and the information about the document
but not the document itself
• The electronic signatures have two- or three-level of verification, depending on
the authority issuing the certificate to use for this purpose
• The meta-data about the document may or may not contain some information
though there are the mandatory fields like document title, size, format
• The non-mandatory fields may be the following: category, date of verification,
author information (name, surname, the company, job title etc.)
The application: conclusions
• The data to be stored has tree structure (the certificate issuing authorities,
documents and their authors)
• The data is dynamic (some of the values may or may not be present, as well as
the nodes in the tree)
• Initial DBMS proposal: Oracle (the team is familiar with it; any data can be
stored in SQL format)
• Alternative DBMS proposal: MongoDB (dynamic schema, tree-like data, JSONs
as a format to store it in by default)
The application: DBMS choice
• Advantages
– The team knows it, we have a DBA and Oracle DB devs on the
team
– Everything can be stored in SQL, no matter what the new data
we need to store, it will be possible
• Disadvantages
– The DML queries became complicated almost from scratch
when the DB schema was normalized (to BCNF – Boyce-Codd
normal form, 3.5NF), it appeared that we needed to write e.g.
recursive queries to fetch the info we needed from tree-like
data
– Vertical scaling was not available while horizontal was
(customer’s request)
Oracle MongoDB
• Advantages
– Backend devs could just use it, no need for a DBA or a
separate DB dev team
– Querying appeared to be easy and fast
– We could use MongoDB sharding for horizontal scaling
• Disadvantages
– Significant data duplication, so sharding has been on demand
almost immediately after we deployed to production
– Further performance optimization was either exponential or
index-based
MongoDB + Oracle in
production
The application: extension
• The application we have deployed in previous section had to be expanded: we
were now supposed to store the documents themselves and monitor user
activity so that the potential data fraud or data loss or other types of attacks
focused on the data could be detected
• Therefore, it was necessary to store the documents in addition to their
metadata (quite a challenge as the formats could vary from .txt to .jpg or .pdf)
• We also needed to add extensive logging and add a complicated real-time log
analysis system to detect the possible threats and inform the relevant
stakeholders about them
The application extension: conclusions
• There is no need to query the documents, just to store them and make sure
they are intact
• The logs and the relevant business logic raise a few challenges
– The number of records is increasing alarmingly fast as well as the file size to store these along with the increasing
number of users (approximately 1 000 000 records per day)
– The log records are to be analyzed in the real time, so performance of this business logic implementation is of
essence
• The alerting mechanism should be fast not only from performance point of
view but also reach the responsible person as quickly as possible
The application extension: solutions
• We decided to store the documents in Oracle (allowing full-functional
transactions, batch processing and regular data backups)
• The main data architecture challenge of storing the data in relational DBMS and
its meta-data in non-relational DBMS is to ensure the proper correspondence
between the two (we wrote a correspondence layer for that purpose in a
backend programming language)
• The logs were stored in MongoDB despite their pre-defined structure (log
message type, log message, date and time) – the reason for that were relatively
simple queries including date-time and message type and further text analysis
implemented as part of the backend source code (i.e. the performance was of
essence and good enough for the project needs)
Measuring
performance
Measuring performance: a few hints
• In Oracle something like profiling can be used,
which was the case for us – a few queries of
interest were selected, and we have measured
the time it took to get the view with the results
on different DB states (empty, average data
amount, maximum foreseeable data amount)
• Useful links
– Measuring DB performance for Oracle
– Oracle DB performance tuning guide
Oracle MongoDB
• MongoDB has a DB profiler and a diagnostic log.
It has explain() method to see the indexes
applied and the execution stats for the query of
interest (which is exactly what we have used)
• Useful links
– Evaluating operations
– Performance in MongoDB
– Monitoring
Querying:
samples
Querying: the differences
• INSERT INTO users (user_id, age, status)
VALUES ('bcd001', 45, 'A’)
• SELECT * FROM users
• UPDATE users SET status = 'C' WHERE age
> 25
• db.start_transaction()
cursor.execute(orderInsert, orderData)
cursor.execute(stockUpdate, stockData)
db.commit()
Oracle MongoDB
• db.users.insert({ user_id: 'bcd001', age: 45,
status: 'A’ })
• db.users.find()
• db.users.update( { age: { $gt: 25 } }, { $set:
{ status: 'C' } }, { multi: true } )
• s.starttransaction() orders.insertone(order,
session=s) stock.updateone(item,
stockUpdate, session=s)
s.committransaction()
Querying Stackoverflow: the schema
• StackExchange Data Explorer
Thinking JSON: Post to User
• StackExchange Data Explorer
{"id": 3048642,
"creationDate": "2010-06-15 20:07:34",
"Body": "<p>You can try something like that :</p><..."}
...,
"Author": {"id": 12164792,
"creationDate": "2019-10-04 13:40:14",
"DisplayName": "q8cworu271",
...}
}
Thinking JSON: User to Post
• StackExchange Data Explorer
{"id": 12164792,
"creationDate": "2019-10-04 13:40:14",
"DisplayName": "q8cworu271",
...,
"posts": [{"id": 3048642,
"creationDate": "2010-06-15 20:07:34",
"Body": "<p>You can try something like that :.
…},
...]
}
Querying Stackoverflow: the examples
SQL MongoDB
• select top 10 * from posts
• select * from users where DisplayName =
'q8cworu271’
• select * from users where DisplayName =
'q8cworu271' AND Reputation = 1
• select top 10 * from users order by
CreationDate
• db.posts.find().limit(10)
• db.users.find( { “DisplayName":
"q8cworu271" } )
• db.users.find( { $and: [ {“DisplayName":
"q8cworu271"}, { “Reputation”: 1 } ] } )
• db.users.find().sort( { “CreationDate": 1 }
).limit(10)
THANK
YOU!
Lorem ipsum

More Related Content

What's hot (20)

PDF
NoSQL Databases
BADR
 
PPT
RDBMS vs NoSQL
Murat Çakal
 
PPTX
Relational databases vs Non-relational databases
James Serra
 
PPTX
Research on vector spatial data storage scheme based
Anant Kumar
 
PPTX
Transitioning from SQL to MongoDB
MongoDB
 
PPTX
Best storage engine for MySQL
tomflemingh2
 
PDF
Mongo DB: Operational Big Data Database
Xpand IT
 
PPTX
Unit 3 MongDB
Praveen M Jigajinni
 
PPTX
Oracle InMemory hardcore edition
Alexander Tokarev
 
PDF
v9.1.2 update
IBM Sverige
 
PPTX
SQL To NoSQL - Top 6 Questions Before Making The Move
IBM Cloud Data Services
 
PPTX
2018 05 08_biological_databases_no_sql
Prof. Wim Van Criekinge
 
PDF
Postgres Foreign Data Wrappers
EDB
 
PPTX
Sql vs NoSQL
RTigger
 
PDF
No sql bigdata and postgresql
Zaid Shabbir
 
PDF
Key Methodologies for Migrating from Oracle to Postgres
EDB
 
PDF
Big data for cio 2015
Zohar Elkayam
 
PDF
Heterogeneous Data - Published
Paul Steffensen
 
PPTX
Faceted search with Oracle InMemory option
Alexander Tokarev
 
PDF
The Hadoop Ecosystem for Developers
Zohar Elkayam
 
NoSQL Databases
BADR
 
RDBMS vs NoSQL
Murat Çakal
 
Relational databases vs Non-relational databases
James Serra
 
Research on vector spatial data storage scheme based
Anant Kumar
 
Transitioning from SQL to MongoDB
MongoDB
 
Best storage engine for MySQL
tomflemingh2
 
Mongo DB: Operational Big Data Database
Xpand IT
 
Unit 3 MongDB
Praveen M Jigajinni
 
Oracle InMemory hardcore edition
Alexander Tokarev
 
v9.1.2 update
IBM Sverige
 
SQL To NoSQL - Top 6 Questions Before Making The Move
IBM Cloud Data Services
 
2018 05 08_biological_databases_no_sql
Prof. Wim Van Criekinge
 
Postgres Foreign Data Wrappers
EDB
 
Sql vs NoSQL
RTigger
 
No sql bigdata and postgresql
Zaid Shabbir
 
Key Methodologies for Migrating from Oracle to Postgres
EDB
 
Big data for cio 2015
Zohar Elkayam
 
Heterogeneous Data - Published
Paul Steffensen
 
Faceted search with Oracle InMemory option
Alexander Tokarev
 
The Hadoop Ecosystem for Developers
Zohar Elkayam
 

Similar to ГАННА КАПЛУН «noSQL vs SQL: порівняння використання реляційних та нереляційних СУБД в комерційних програмах» Kyiv QADay 2021 (20)

PDF
MongoDB Lab Manual (1).pdf used in data science
bitragowthamkumar1
 
PPTX
MongoDB Introduction - Document Oriented Nosql Database
Sudhir Patil
 
PDF
SQL vs NoSQL deep dive
Ahmed Shaaban
 
PDF
SQL vs NoSQL, an experiment with MongoDB
Marco Segato
 
PPTX
Mongodb
ASEEMSRIVASTAVA22
 
KEY
Austin NoSQL 2011-07-06
jimbojsb
 
PPTX
MongoDB Evenings Minneapolis: MongoDB is Cool But When Should I Use It?
MongoDB
 
PPT
Mongo Bb - NoSQL tutorial
Mohan Rathour
 
PDF
Considerations for using NoSQL technology on your next IT project - Akmal Cha...
jaxconf
 
PPTX
mongodb_DS.pptx
DavoudSalehi1
 
PDF
Nosql part1 8th December
Ruru Chowdhury
 
PPTX
NoSQL Architecture Overview
Christopher Foot
 
PPTX
Unit-10.pptx
GhanashyamBK1
 
PDF
Json within a relational database
Dave Stokes
 
PDF
Open Source World June '21 -- JSON Within a Relational Database
Dave Stokes
 
PPTX
NoSQL(MongoDB and DynamoDB) Overview.pptx
nikhilaukhaj590
 
PPTX
Nosql
Roxana Tadayon
 
PPTX
Nosql
ROXTAD71
 
PPTX
MongoDB
Rony Gregory
 
MongoDB Lab Manual (1).pdf used in data science
bitragowthamkumar1
 
MongoDB Introduction - Document Oriented Nosql Database
Sudhir Patil
 
SQL vs NoSQL deep dive
Ahmed Shaaban
 
SQL vs NoSQL, an experiment with MongoDB
Marco Segato
 
Austin NoSQL 2011-07-06
jimbojsb
 
MongoDB Evenings Minneapolis: MongoDB is Cool But When Should I Use It?
MongoDB
 
Mongo Bb - NoSQL tutorial
Mohan Rathour
 
Considerations for using NoSQL technology on your next IT project - Akmal Cha...
jaxconf
 
mongodb_DS.pptx
DavoudSalehi1
 
Nosql part1 8th December
Ruru Chowdhury
 
NoSQL Architecture Overview
Christopher Foot
 
Unit-10.pptx
GhanashyamBK1
 
Json within a relational database
Dave Stokes
 
Open Source World June '21 -- JSON Within a Relational Database
Dave Stokes
 
NoSQL(MongoDB and DynamoDB) Overview.pptx
nikhilaukhaj590
 
Nosql
ROXTAD71
 
MongoDB
Rony Gregory
 
Ad

More from QADay (20)

PDF
СТАНІСЛАВ ПОЛЬСЬКОЙ «QA це спільна справа: залучення БА та девів у процес заб...
QADay
 
PPTX
РАМЕЛЛА БАСЕНКО - Tехніки тест дизайну в дії: розбір задач та корисні поради...
QADay
 
PDF
КАТЕРИНА АБЗЯТОВА - Tехніки тест дизайну в дії: розбір задач та корисні порад...
QADay
 
PDF
ЮРІЙ БАЖАН «Один спринт з життя тестувальника»
QADay
 
PDF
АЛЛА ПЕНАЛЬБА «QA automation, the secret weapon that need (a) manual»
QADay
 
PDF
ЮРІЙ МАЛИЙ «QA метрики в процесі SDLC»..
QADay
 
PDF
АНДРІЙ ЗАБЛОЦЬКИЙ « Досвід побудови сильної та ефективної QA команди»
QADay
 
PDF
РІНА УЖЕВКО «Тестування локалізації та терміни в Gamedev»
QADay
 
PPTX
КАТЕРИНА АБЗЯТОВА «Від бар’єрів до мостів: Важливість Accessibility Testing»
QADay
 
PPTX
ЄВГЕН ГАЙДАЙ «Виділена команда автоматизації тестування. Досвід підтримки та ...
QADay
 
PDF
АНАСТАСІЯ ЧУДОВСЬКА «Переїзд з моноліта на мікросервіси з точки зору QA: як ...
QADay
 
PDF
СОФІЯ НОВАЧЕНКО «Успішне поєднання QA/BA обовʼязків»
QADay
 
PDF
ОЛЕНА НІКІТІНА «Глибинне занурення в процеси тестування: від документації до ...
QADay
 
PDF
ОЛЕСЬ НІКАНЮК «Особливості тестування в міжнародних організаціях: досвід та в...
QADay
 
PPTX
ОЛЕГ ЗАРЕВИЧ «Взаємодії між DevOps і QA»
QADay
 
PPTX
СВЯТ ЛОГІН «Що можна витягнути з мобільних додатків»
QADay
 
PPTX
ГАННА КАПЛУН «Тестування на основі персон: ідея, інструменти, приклади»
QADay
 
PDF
НАТАЛІЯ КРИВОНІС «Необхідні навички для керування командою»
QADay
 
PDF
ОКСАНА ВЕРЕТЮК «Effective project quality check або як успішно налагодити про...
QADay
 
PDF
ВІТАЛІЙ МИХАЙЛЮК «Онбордінг нових тестерів до команди: як ефективно навчати і...
QADay
 
СТАНІСЛАВ ПОЛЬСЬКОЙ «QA це спільна справа: залучення БА та девів у процес заб...
QADay
 
РАМЕЛЛА БАСЕНКО - Tехніки тест дизайну в дії: розбір задач та корисні поради...
QADay
 
КАТЕРИНА АБЗЯТОВА - Tехніки тест дизайну в дії: розбір задач та корисні порад...
QADay
 
ЮРІЙ БАЖАН «Один спринт з життя тестувальника»
QADay
 
АЛЛА ПЕНАЛЬБА «QA automation, the secret weapon that need (a) manual»
QADay
 
ЮРІЙ МАЛИЙ «QA метрики в процесі SDLC»..
QADay
 
АНДРІЙ ЗАБЛОЦЬКИЙ « Досвід побудови сильної та ефективної QA команди»
QADay
 
РІНА УЖЕВКО «Тестування локалізації та терміни в Gamedev»
QADay
 
КАТЕРИНА АБЗЯТОВА «Від бар’єрів до мостів: Важливість Accessibility Testing»
QADay
 
ЄВГЕН ГАЙДАЙ «Виділена команда автоматизації тестування. Досвід підтримки та ...
QADay
 
АНАСТАСІЯ ЧУДОВСЬКА «Переїзд з моноліта на мікросервіси з точки зору QA: як ...
QADay
 
СОФІЯ НОВАЧЕНКО «Успішне поєднання QA/BA обовʼязків»
QADay
 
ОЛЕНА НІКІТІНА «Глибинне занурення в процеси тестування: від документації до ...
QADay
 
ОЛЕСЬ НІКАНЮК «Особливості тестування в міжнародних організаціях: досвід та в...
QADay
 
ОЛЕГ ЗАРЕВИЧ «Взаємодії між DevOps і QA»
QADay
 
СВЯТ ЛОГІН «Що можна витягнути з мобільних додатків»
QADay
 
ГАННА КАПЛУН «Тестування на основі персон: ідея, інструменти, приклади»
QADay
 
НАТАЛІЯ КРИВОНІС «Необхідні навички для керування командою»
QADay
 
ОКСАНА ВЕРЕТЮК «Effective project quality check або як успішно налагодити про...
QADay
 
ВІТАЛІЙ МИХАЙЛЮК «Онбордінг нових тестерів до команди: як ефективно навчати і...
QADay
 
Ad

Recently uploaded (20)

PPTX
BANDHA (BANDAGES) PPT.pptx ayurveda shalya tantra
rakhan78619
 
PDF
ARAL-Orientation_Morning-Session_Day-11.pdf
JoelVilloso1
 
PPTX
Unit 2 COMMERCIAL BANKING, Corporate banking.pptx
AnubalaSuresh1
 
PDF
People & Earth's Ecosystem -Lesson 2: People & Population
marvinnbustamante1
 
PPTX
How to Configure Lost Reasons in Odoo 18 CRM
Celine George
 
PPTX
How to Set Maximum Difference Odoo 18 POS
Celine George
 
PPTX
2025 Winter SWAYAM NPTEL & A Student.pptx
Utsav Yagnik
 
PDF
ARAL_Orientation_Day-2-Sessions_ARAL-Readung ARAL-Mathematics ARAL-Sciencev2.pdf
JoelVilloso1
 
PDF
'' IMPORTANCE OF EXCLUSIVE BREAST FEEDING ''
SHAHEEN SHAIKH
 
PDF
Federal dollars withheld by district, charter, grant recipient
Mebane Rash
 
PPTX
Views on Education of Indian Thinkers J.Krishnamurthy..pptx
ShrutiMahanta1
 
PPTX
Gall bladder, Small intestine and Large intestine.pptx
rekhapositivity
 
PDF
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - GLOBAL SUCCESS - CẢ NĂM - NĂM 2024 (VOCABULARY, ...
Nguyen Thanh Tu Collection
 
PPT
Talk on Critical Theory, Part II, Philosophy of Social Sciences
Soraj Hongladarom
 
PDF
Dimensions of Societal Planning in Commonism
StefanMz
 
PPTX
Soil and agriculture microbiology .pptx
Keerthana Ramesh
 
PDF
IMP NAAC-Reforms-Stakeholder-Consultation-Presentation-on-Draft-Metrics-Unive...
BHARTIWADEKAR
 
PDF
CEREBRAL PALSY: NURSING MANAGEMENT .pdf
PRADEEP ABOTHU
 
PPTX
Stereochemistry-Optical Isomerism in organic compoundsptx
Tarannum Nadaf-Mansuri
 
PPTX
Growth and development and milestones, factors
BHUVANESHWARI BADIGER
 
BANDHA (BANDAGES) PPT.pptx ayurveda shalya tantra
rakhan78619
 
ARAL-Orientation_Morning-Session_Day-11.pdf
JoelVilloso1
 
Unit 2 COMMERCIAL BANKING, Corporate banking.pptx
AnubalaSuresh1
 
People & Earth's Ecosystem -Lesson 2: People & Population
marvinnbustamante1
 
How to Configure Lost Reasons in Odoo 18 CRM
Celine George
 
How to Set Maximum Difference Odoo 18 POS
Celine George
 
2025 Winter SWAYAM NPTEL & A Student.pptx
Utsav Yagnik
 
ARAL_Orientation_Day-2-Sessions_ARAL-Readung ARAL-Mathematics ARAL-Sciencev2.pdf
JoelVilloso1
 
'' IMPORTANCE OF EXCLUSIVE BREAST FEEDING ''
SHAHEEN SHAIKH
 
Federal dollars withheld by district, charter, grant recipient
Mebane Rash
 
Views on Education of Indian Thinkers J.Krishnamurthy..pptx
ShrutiMahanta1
 
Gall bladder, Small intestine and Large intestine.pptx
rekhapositivity
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - GLOBAL SUCCESS - CẢ NĂM - NĂM 2024 (VOCABULARY, ...
Nguyen Thanh Tu Collection
 
Talk on Critical Theory, Part II, Philosophy of Social Sciences
Soraj Hongladarom
 
Dimensions of Societal Planning in Commonism
StefanMz
 
Soil and agriculture microbiology .pptx
Keerthana Ramesh
 
IMP NAAC-Reforms-Stakeholder-Consultation-Presentation-on-Draft-Metrics-Unive...
BHARTIWADEKAR
 
CEREBRAL PALSY: NURSING MANAGEMENT .pdf
PRADEEP ABOTHU
 
Stereochemistry-Optical Isomerism in organic compoundsptx
Tarannum Nadaf-Mansuri
 
Growth and development and milestones, factors
BHUVANESHWARI BADIGER
 

ГАННА КАПЛУН «noSQL vs SQL: порівняння використання реляційних та нереляційних СУБД в комерційних програмах» Kyiv QADay 2021

  • 1. noSQL vs SQL Comparative usage of noSQL and SQL DBMS in enterprise applications
  • 2. Contents noSQL vs SQL MongoDB in production MongoDB + Oracle in production Measuring performance Querying: examples
  • 3. Intro A few words about me • My name is Hanna Kaplun, I’m a QA Lead at Intellias • Got more than 10 years of experience in testing • and a few as a support specialist, technical writer, Oracle DB developer, BA • Here’s my LinkedIn • Yes, testing is my conscious choice  • Held a few trainings, meetups etc. for various audiences • Wrote a number of articles about testing • and data science using R • Got myself a husband and two kids, 6 yrs and 1 yr correspondingly 
  • 5. SQL vs noSQL: main differences • They are relational, i.e. they can be normalized to store the data efficiently taking less space • There is a pre-defined schema, the structure where the data is stored (DDL portion of SQL) • Data structures to store data are tables with fixed columns • Vertical scaling (buy a larger server) • Multi-record transactions are supported SQL noSQL • They are non-relational, therefore no storage optimization • The schema is dynamic as the data are unstructured • Data structures may vary – Document (JSON, i.e. trees) – Key-value – Wide-column (tables with dynamic columns) – Graph (nodes and edges, dict of lists as a rule) • Horizontal scaling (buy more servers) • Multi-record transactions are usually not supported
  • 6. SQL vs noSQL: advantages and disadvantages • Advantages – Normalization is available to optimize data storage – Transactions to support ACID – ANY data can be stored this way • Disadvantages – Joins and lots of complicated query stuff – May be slow – Additional interfaces to communicate with DBMS from a programming language SQL noSQL • Advantages – Fast – Easy to understand and query as the data are stored naturally and dynamically, no need to restructure them – Easy to query from a programming language • Disadvantages – Possible data duplication – No transactions, so implementing ACID principles may be a challenge (usually it is done via a programming language, not via a DBMS)
  • 7. SQL vs noSQL: DBMS examples • Oracle • MySQL • MS SQL Server • PostgreSQL • SQLite SQL noSQL • Document-based – MongoDB – CouchDB • Key-value based – DynamoDB – Redis • Wide-column based: – Cassandra – HBase • Graph based: – Neo4j – Amazon Neptune
  • 9. The application • The general idea of the application was to sign the documents with electronic signatures and store the signature and the information about the document but not the document itself • The electronic signatures have two- or three-level of verification, depending on the authority issuing the certificate to use for this purpose • The meta-data about the document may or may not contain some information though there are the mandatory fields like document title, size, format • The non-mandatory fields may be the following: category, date of verification, author information (name, surname, the company, job title etc.)
  • 10. The application: conclusions • The data to be stored has tree structure (the certificate issuing authorities, documents and their authors) • The data is dynamic (some of the values may or may not be present, as well as the nodes in the tree) • Initial DBMS proposal: Oracle (the team is familiar with it; any data can be stored in SQL format) • Alternative DBMS proposal: MongoDB (dynamic schema, tree-like data, JSONs as a format to store it in by default)
  • 11. The application: DBMS choice • Advantages – The team knows it, we have a DBA and Oracle DB devs on the team – Everything can be stored in SQL, no matter what the new data we need to store, it will be possible • Disadvantages – The DML queries became complicated almost from scratch when the DB schema was normalized (to BCNF – Boyce-Codd normal form, 3.5NF), it appeared that we needed to write e.g. recursive queries to fetch the info we needed from tree-like data – Vertical scaling was not available while horizontal was (customer’s request) Oracle MongoDB • Advantages – Backend devs could just use it, no need for a DBA or a separate DB dev team – Querying appeared to be easy and fast – We could use MongoDB sharding for horizontal scaling • Disadvantages – Significant data duplication, so sharding has been on demand almost immediately after we deployed to production – Further performance optimization was either exponential or index-based
  • 12. MongoDB + Oracle in production
  • 13. The application: extension • The application we have deployed in previous section had to be expanded: we were now supposed to store the documents themselves and monitor user activity so that the potential data fraud or data loss or other types of attacks focused on the data could be detected • Therefore, it was necessary to store the documents in addition to their metadata (quite a challenge as the formats could vary from .txt to .jpg or .pdf) • We also needed to add extensive logging and add a complicated real-time log analysis system to detect the possible threats and inform the relevant stakeholders about them
  • 14. The application extension: conclusions • There is no need to query the documents, just to store them and make sure they are intact • The logs and the relevant business logic raise a few challenges – The number of records is increasing alarmingly fast as well as the file size to store these along with the increasing number of users (approximately 1 000 000 records per day) – The log records are to be analyzed in the real time, so performance of this business logic implementation is of essence • The alerting mechanism should be fast not only from performance point of view but also reach the responsible person as quickly as possible
  • 15. The application extension: solutions • We decided to store the documents in Oracle (allowing full-functional transactions, batch processing and regular data backups) • The main data architecture challenge of storing the data in relational DBMS and its meta-data in non-relational DBMS is to ensure the proper correspondence between the two (we wrote a correspondence layer for that purpose in a backend programming language) • The logs were stored in MongoDB despite their pre-defined structure (log message type, log message, date and time) – the reason for that were relatively simple queries including date-time and message type and further text analysis implemented as part of the backend source code (i.e. the performance was of essence and good enough for the project needs)
  • 17. Measuring performance: a few hints • In Oracle something like profiling can be used, which was the case for us – a few queries of interest were selected, and we have measured the time it took to get the view with the results on different DB states (empty, average data amount, maximum foreseeable data amount) • Useful links – Measuring DB performance for Oracle – Oracle DB performance tuning guide Oracle MongoDB • MongoDB has a DB profiler and a diagnostic log. It has explain() method to see the indexes applied and the execution stats for the query of interest (which is exactly what we have used) • Useful links – Evaluating operations – Performance in MongoDB – Monitoring
  • 19. Querying: the differences • INSERT INTO users (user_id, age, status) VALUES ('bcd001', 45, 'A’) • SELECT * FROM users • UPDATE users SET status = 'C' WHERE age > 25 • db.start_transaction() cursor.execute(orderInsert, orderData) cursor.execute(stockUpdate, stockData) db.commit() Oracle MongoDB • db.users.insert({ user_id: 'bcd001', age: 45, status: 'A’ }) • db.users.find() • db.users.update( { age: { $gt: 25 } }, { $set: { status: 'C' } }, { multi: true } ) • s.starttransaction() orders.insertone(order, session=s) stock.updateone(item, stockUpdate, session=s) s.committransaction()
  • 20. Querying Stackoverflow: the schema • StackExchange Data Explorer
  • 21. Thinking JSON: Post to User • StackExchange Data Explorer {"id": 3048642, "creationDate": "2010-06-15 20:07:34", "Body": "<p>You can try something like that :</p><..."} ..., "Author": {"id": 12164792, "creationDate": "2019-10-04 13:40:14", "DisplayName": "q8cworu271", ...} }
  • 22. Thinking JSON: User to Post • StackExchange Data Explorer {"id": 12164792, "creationDate": "2019-10-04 13:40:14", "DisplayName": "q8cworu271", ..., "posts": [{"id": 3048642, "creationDate": "2010-06-15 20:07:34", "Body": "<p>You can try something like that :. …}, ...] }
  • 23. Querying Stackoverflow: the examples SQL MongoDB • select top 10 * from posts • select * from users where DisplayName = 'q8cworu271’ • select * from users where DisplayName = 'q8cworu271' AND Reputation = 1 • select top 10 * from users order by CreationDate • db.posts.find().limit(10) • db.users.find( { “DisplayName": "q8cworu271" } ) • db.users.find( { $and: [ {“DisplayName": "q8cworu271"}, { “Reputation”: 1 } ] } ) • db.users.find().sort( { “CreationDate": 1 } ).limit(10)