First, We need a problem
to Solve.
This is US
And we had the chance to be here and
to impalement OTNEngine
And to display the analytics and data
has some challenges
Need to find a way to store and retrieve the data
- Would be able to run for long time (as long as the
session)
- Can write real time traffic
- Can read huge traffic for analytics and
- Has a catchy name
Our user want to monitor traffic and
see the progress
So how to monitor data and see the
history of records
Focal Points for frames Monitoring
• Number of frames generated
• What are the headers most important value
• Did it fit the allocated space
• How long did the graph stall
• Did the user get board?
• How do these numbers compare to what is expected
• How do these number up to the user expectation
This creates a lot of data
• Not all data are important for each user
• Sampling frames generated in one hour (approx): 360K frame
• Samples from frames are trimmed to the header only so the size reduces by 63
• Those headers will be aggregated with some meta data like page number
• After aggreges we are left with 360k header per hour
• Each row has 256 measures of viewability metrics
Real-time accumulation and
aggregation
Consume header of frames as fast as possible
Serving API calls continuously
Single machine with
- 6 physical cores
- Low latency
- Overprovisioned
What didn’t work: MySQL
• Original DB choice
• Performed very well, when daily volume was < 1% of
current volume
• Impossible to add new columns to database
What didn’t work LFS
• Very fast compared to any db.
• Not Optimized for Random Access
• Needed local disk with very fast I/O
What did work: Postgres
• Fault tolerance against interrupts
• Can have a dynamic schema (json data-type)
• Support multiple connection
Main Flow
DB fast?
• No loops needed (used array to insert records)
• Connection pool (for parallel writes to db)
• Non-blocking code (used async + threads)
• Prepared statement (fancy words)
• In an 2.9 GHz 6-Core Intel Core i9
• Write 10k frames in 40 ms
• Read 10k frames in 50 ms
Saving Frames Payload
What didn’t work: Compressing
Payload
• Our Data is pRPS
• Reduce in both read and write
• From 40 ms to 250 to the same size
• With this rate it is estimated to have 200GB in
24

More Related Content

PPTX
Hardware Provisioning
PPTX
Capacity Planning
PPTX
Managing Security At 1M Events a Second using Elasticsearch
PDF
InfiniFlux vs_RDBMS
PPTX
Hardware Provisioning
PDF
Scalable web architecture
PPTX
MongoDB Capacity Planning
PPTX
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
Hardware Provisioning
Capacity Planning
Managing Security At 1M Events a Second using Elasticsearch
InfiniFlux vs_RDBMS
Hardware Provisioning
Scalable web architecture
MongoDB Capacity Planning
Powering Interactive Data Analysis at Pinterest by Amazon Redshift

Similar to whyPostgres, a presentation on the project choice for a storage system (20)

PDF
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
PDF
Cassandra Day Chicago 2015: Diagnosing Problems in Production
PDF
Cassandra Day London 2015: Diagnosing Problems in Production
PDF
Presto At Treasure Data
PPTX
Toronto High Scalability meetup - Scaling ELK
PDF
Nisha talagala keynote_inflow_2016
PPTX
Cloud computing UNIT 2.1 presentation in
PDF
Breaking data
PDF
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
PDF
Data Care, Feeding, and Maintenance
KEY
From 100s to 100s of Millions
PDF
Meta scale kognitio hadoop webinar
PPTX
Hekaton introduction for .Net developers
PDF
Gruter TECHDAY 2014 Realtime Processing in Telco
PDF
Redshift deep dive
PPTX
Solving Office 365 Big Challenges using Cassandra + Spark
PDF
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
PPTX
Webinar: Capacity Planning
PDF
Building Big Data Streaming Architectures
PPT
Building a CRM on top of ElasticSearch
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in Production
Presto At Treasure Data
Toronto High Scalability meetup - Scaling ELK
Nisha talagala keynote_inflow_2016
Cloud computing UNIT 2.1 presentation in
Breaking data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
Data Care, Feeding, and Maintenance
From 100s to 100s of Millions
Meta scale kognitio hadoop webinar
Hekaton introduction for .Net developers
Gruter TECHDAY 2014 Realtime Processing in Telco
Redshift deep dive
Solving Office 365 Big Challenges using Cassandra + Spark
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
Webinar: Capacity Planning
Building Big Data Streaming Architectures
Building a CRM on top of ElasticSearch
Ad

Recently uploaded (20)

DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PDF
LDMMIA Reiki Yoga Finals Review Spring Summer
PDF
Uderstanding digital marketing and marketing stratergie for engaging the digi...
PDF
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PPTX
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
PPTX
TNA_Presentation-1-Final(SAVE)) (1).pptx
PDF
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
PDF
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
PPTX
Introduction to pro and eukaryotes and differences.pptx
PDF
Trump Administration's workforce development strategy
PDF
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
PDF
What if we spent less time fighting change, and more time building what’s rig...
PDF
Environmental Education MCQ BD2EE - Share Source.pdf
PDF
Empowerment Technology for Senior High School Guide
PDF
International_Financial_Reporting_Standa.pdf
PDF
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
PPTX
B.Sc. DS Unit 2 Software Engineering.pptx
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
LDMMIA Reiki Yoga Finals Review Spring Summer
Uderstanding digital marketing and marketing stratergie for engaging the digi...
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
TNA_Presentation-1-Final(SAVE)) (1).pptx
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
202450812 BayCHI UCSC-SV 20250812 v17.pptx
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
Introduction to pro and eukaryotes and differences.pptx
Trump Administration's workforce development strategy
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
What if we spent less time fighting change, and more time building what’s rig...
Environmental Education MCQ BD2EE - Share Source.pdf
Empowerment Technology for Senior High School Guide
International_Financial_Reporting_Standa.pdf
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
B.Sc. DS Unit 2 Software Engineering.pptx
Ad

whyPostgres, a presentation on the project choice for a storage system

  • 1. First, We need a problem to Solve.
  • 3. And we had the chance to be here and to impalement OTNEngine
  • 4. And to display the analytics and data has some challenges Need to find a way to store and retrieve the data - Would be able to run for long time (as long as the session) - Can write real time traffic - Can read huge traffic for analytics and - Has a catchy name
  • 5. Our user want to monitor traffic and see the progress
  • 6. So how to monitor data and see the history of records
  • 7. Focal Points for frames Monitoring • Number of frames generated • What are the headers most important value • Did it fit the allocated space • How long did the graph stall • Did the user get board? • How do these numbers compare to what is expected • How do these number up to the user expectation
  • 8. This creates a lot of data • Not all data are important for each user • Sampling frames generated in one hour (approx): 360K frame • Samples from frames are trimmed to the header only so the size reduces by 63 • Those headers will be aggregated with some meta data like page number • After aggreges we are left with 360k header per hour • Each row has 256 measures of viewability metrics
  • 9. Real-time accumulation and aggregation Consume header of frames as fast as possible Serving API calls continuously Single machine with - 6 physical cores - Low latency - Overprovisioned
  • 10. What didn’t work: MySQL • Original DB choice • Performed very well, when daily volume was < 1% of current volume • Impossible to add new columns to database
  • 11. What didn’t work LFS • Very fast compared to any db. • Not Optimized for Random Access • Needed local disk with very fast I/O
  • 12. What did work: Postgres • Fault tolerance against interrupts • Can have a dynamic schema (json data-type) • Support multiple connection
  • 14. DB fast? • No loops needed (used array to insert records) • Connection pool (for parallel writes to db) • Non-blocking code (used async + threads) • Prepared statement (fancy words) • In an 2.9 GHz 6-Core Intel Core i9 • Write 10k frames in 40 ms • Read 10k frames in 50 ms
  • 16. What didn’t work: Compressing Payload • Our Data is pRPS • Reduce in both read and write • From 40 ms to 250 to the same size • With this rate it is estimated to have 200GB in 24