Scaling a SaaS backend with PostgreSQL - A case study

7 likes1,376 views

The document discusses the challenges of scaling a PostgreSQL database for a SAAS backend with growing data. It describes how the company initially separated OLTP and OLAP data into separate databases but later unified them into a single database approach. It discusses partitioning the data using separate databases for each customer account and the benefits and limitations of this approach. It also covers additional performance issues encountered and solutions implemented including advisory locks, bulk loading optimizations, and maintaining spare databases to speed up new account creation. The document emphasizes the importance of schemas for code versioning and staging releases.

Software

SCALING A SAAS BACKEND
WITH POSTGRESQL – A CASE STUDY
PostgreSQL Conference Europe
Madrid 2014-10-24
Oliver Seemann - Bidmanagement GmbH
oliver.seemann@adspert.net

We do productivity tools for
advertisers

Upper boundary:
5M keywords × 365 days
× 20 bigints/doubles
≅ 300GB

“Slow” OLAP data for daily batch-processing
jobs

Initially separate databases
Slow
Data
Fast
Data

Data overlaps significantly
Slow
Data
Fast
Data

We went with unified approach
Slow
Data
Fast
Data

Design by the book
Keywords
PK,FK1 adgroup_id
PK keyword_id
Campaign
PK campaign_id
FK1 account_id
Adgroup
PK adgroup_id
FK1 campaign_id
Account
PK account_id
FK1 customer_id
Customer
PK customer_id
User
PK user_id
FK1 customer_id
History
PK day
PK,FK1 keyword_id
PK,FK1,FK2 adgroup_id
UserAccountAccess
PK,FK1 account_id
PK,FK2 user_id
Scenario
PK,FK1 keyword_id
PK,FK1 adgroup_id
PK factor

All Accounts
Account 1 – Rec 1
Account 2 – Rec 1
Account 1 – Rec 2
Account 3 – Rec 1
Account 2 – Rec 2
Account 2 – Rec 3
Account 1 – Rec 3
Account 3 – Rec 2

+10fold increase per level
Account >10
FK Campaign >1k
FK Ad Group >100K
FK Keyword >10M
FK History >100M

Partitioning, somehow
Account 1
Account 1 – Rec 1
Account 1 – Rec 2
Account 1 – Rec 3
Account 2
Account 2 – Rec 1
Account 2 – Rec 2
Account 2 – Rec 3
Account 3
Account 3 – Rec 1
Account 3 – Rec 2
Account 3 – Rec 3

Partitioning with inheritance
Parent
Child Child Child
check-constraints
SELECT
INSERT

PG Partitioning is nifty –
but not a match for our case

Our case:
Little to no shared data between
clients

Isolate accounts
One DB Many DBs/Schemas?

Both approaches:
+ Good horizontal scaling

Both approaches:
+ Good tool support
(e.g. pg_dump/restore)

Partition into databases:
+ Easy cloning
CREATE DATABASE foo TEMPLATE bar;

Partition into databases:
+ Stricter isolation (security)

Partition into databases:
- Some Overhead

Partition into databases:
- No direct references

Partition into schemas:
+ More lightweight

Partition into schemas:
+ Full references

Partition into schemas:
- No easy cloning

Partition into schemas:
- No cascading schemas

Now:
Several thousand databases
on five 1TB machines

Now:
Plus main DB server pair
with <10GB data

Setup
Main DB Hosts
standalone-0 standalone-1
standalone-2 standalone-3
master slave
Account DB Hosts

From 300MB/s to 30MB/s
More concurrent queries
Longer query runtime

Different apps
Different access patterns
Web
Apps
Compute
Cluster
Many small/
fast queries
Few very slow/
big queries

Limit concurrent access
with counting semaphore
Web
Apps
Compute
Cluster
Many small/
fast queries
Few very slow/
big queries

Implement Semaphore using
Advisory Locks

In general:
Very happy with our approach

CREATE SCHEMA foo;
CREATE SCHEMA foo.bar;
CREATE SCHEMA foo.bar.baz;

Previously:
1. Read bulk raw data from DB
2. Number crunching in app
3. Write bulk results to DB

How to test it?
App test suite
goes a long way.

Different production stages
Versioning with schemas

Every 4-8 weeks:
CREATE SCHEMA version_%d;

Assign each version a stage:
unstable -> testing -> stable

Stage App Schema COUNT(account)
unstable v22.4 version_22 0% - 2%
testing v21.13 version_21 1% – 50%
stable v20.19 version_20 50% – 100%

Watchdogs on key metrics
alert on suspicious behaviour

Takeaway:
Databases can be aptly
used as partitions

Takeaway:
Schemas can be used
for versioning

ORM
Can’t live with it,
can’t live without it

Scaling a SaaS backend with PostgreSQL - A case study

1. SCALING A SAAS BACKEND WITH POSTGRESQL – A CASE STUDY PostgreSQL Conference Europe Madrid 2014-10-24 Oliver Seemann - Bidmanagement GmbH [email protected]

2. Gigabytes Terabytes Growing Data

3. We do productivity tools for advertisers

4. Significant amounts of data

5. Upper boundary: 5M keywords × 365 days × 20 bigints/doubles ≅ 300GB

6. OLTP / OLAP Duality

7. “Slow” OLAP data for daily batch-processing jobs

8. “Fast” OLTP data for human interaction

9. Initially separate databases Slow Data Fast Data

10. Data overlaps significantly Slow Data Fast Data

11. We went with unified approach Slow Data Fast Data

12. Currently: 7 machines running PG 9.3

13. Currently: ~3 TB Data

14. Currently: largest table: ~100GB

15. How it all started..

16. It began as an experiment

17. Design by the book Keywords PK,FK1 adgroup_id PK keyword_id Campaign PK campaign_id FK1 account_id Adgroup PK adgroup_id FK1 campaign_id Account PK account_id FK1 customer_id Customer PK customer_id User PK user_id FK1 customer_id History PK day PK,FK1 keyword_id PK,FK1,FK2 adgroup_id UserAccountAccess PK,FK1 account_id PK,FK2 user_id Scenario PK,FK1 keyword_id PK,FK1 adgroup_id PK factor

18. Soon tens of GB >100M records

19. All Accounts Account 1 – Rec 1 Account 2 – Rec 1 Account 1 – Rec 2 Account 3 – Rec 1 Account 2 – Rec 2 Account 2 – Rec 3 Account 1 – Rec 3 Account 3 – Rec 2

20. +10fold increase per level Account >10 FK Campaign >1k FK Ad Group >100K FK Keyword >10M FK History >100M

21. Partitioning, somehow Account 1 Account 1 – Rec 1 Account 1 – Rec 2 Account 1 – Rec 3 Account 2 Account 2 – Rec 1 Account 2 – Rec 2 Account 2 – Rec 3 Account 3 Account 3 – Rec 1 Account 3 – Rec 2 Account 3 – Rec 3

22. Partitioning with inheritance Parent Child Child Child check-constraints SELECT INSERT

23. PG Partitioning is nifty – but not a match for our case

24. Our case: Little to no shared data between clients

25. Isolate accounts One DB Many DBs/Schemas?

26. Both approaches: + Good horizontal scaling

27. Both approaches: + Good tool support (e.g. pg_dump/restore)

28. Partition into databases: + Easy cloning CREATE DATABASE foo TEMPLATE bar;

29. Partition into databases: + Stricter isolation (security)

30. Partition into databases: - Some Overhead

31. Partition into databases: - No direct references

32. Partition into schemas: + More lightweight

33. Partition into schemas: + Full references

34. Partition into schemas: - No easy cloning

35. Partition into schemas: - No cascading schemas

36. Now: Several thousand databases on five 1TB machines

37. Now: Plus main DB server pair with <10GB data

38. Setup Main DB Hosts standalone-0 standalone-1 standalone-2 standalone-3 master slave Account DB Hosts

39. No replication on account db hosts?

40. Performance Problems

41. Too many concurrent full table scans

42. From 300MB/s to 30MB/s More concurrent queries Longer query runtime

43. Different apps Different access patterns Web Apps Compute Cluster Many small/ fast queries Few very slow/ big queries

44. Limit concurrent access with counting semaphore Web Apps Compute Cluster Many small/ fast queries Few very slow/ big queries

45. Implement Semaphore using Advisory Locks

46. Simpler than setting up Zookeeper

47. More performance problems: Bulk Inserts

48. Solved with common best practices:

49. COPY exclusively

50. Drop / Recreate indexes

51. COPY to new table + swap

52. Another problem:

53. CREATE DATABASE can take a while

54. Signup Delays Signup Web App + CREATE DATABASE Can take up to 5-15 min

55. CREATE DATABASE performs a CHECKPOINT

56. Solution: Keep stock of spare databases

57. In general: Very happy with our approach

58. Databases are tangible

59. Move DBs between hosts

60. Painless 9.0 -> 9.3 migration

61. Use schemas as partitions?

62. Would prevent regular schema usage

63. CREATE SCHEMA foo; CREATE SCHEMA foo.bar; CREATE SCHEMA foo.bar.baz;

64. Schemas are crucial for us

65. Versioning of database code

66. Grown to about ~15k SQL functions/views

67. Moved core algorithms from app to db

68. Previously: 1. Read bulk raw data from DB 2. Number crunching in app 3. Write bulk results to DB

69. 4x-10x faster in DB 2x-4x RAM reduction

70. SQL is harder to read & write

71. How to test it? App test suite goes a long way.

72. Different production stages Versioning with schemas

73. Every 4-8 weeks: CREATE SCHEMA version_%d;

74. Assign each version a stage: unstable -> testing -> stable

75. Stage App Schema COUNT(account) unstable v22.4 version_22 0% - 2% testing v21.13 version_21 1% – 50% stable v20.19 version_20 50% – 100%

76. Watchdogs on key metrics alert on suspicious behaviour

77. Schemas are VERY important

78. Takeaway: Databases can be aptly used as partitions

79. Takeaway: So can schemas

80. Takeaway: Schemas can be used for versioning

81. Thanks for listening Questions?

82. Managing Schema Changes

83. ORM Can’t live with it, can’t live without it

84. PG Wishes

Editor's Notes

#2: Hi, I’m Oliver, I’m a software developer, currently heading the development team at Bidmanagement GmbH in Berlin.
#3: I’m going to talk about how we’re using it as main datastore in our system Non of the solutions or approaches are .. But by using some pg features in a non-standard way certain problems can be solved quite elegantly And seeing that this works very well for some, maybe will be helpful to some of you when you have similar problems now or in the future.
#4: Mostly in the area of search engine marketing Which today is mostly adwords, however we also support other networks, for example yandex Our flagship product is a fully automatic bid management solution. Everyday we’re changing the bids on tens of millions of keywords and learn from the effects to steer campaign performance towards goals configured by the user. The philosophy is to take the mind numbing number crunching tasks away from the user, because a computer can do it better and much more efficiently, especially when you have thousands or millions of objects to analyze.
#5: - Replicate the campaign structure Provide reporting interface I don’t want to bore you with the technical details about how search engine marketing work so let’s just say we store a lot of ints and floats and especially time series data of those. To get an idea of the ballpark we’re working with let’s have a look at the upper boundary
#6: Ballpark estimates Upper bound Time series data Hierarchical data Clicks, impressions and also lots of statistical projections with confidence intervals Of course most of those values are actually zero and those can be omitted when storing the data. So it may actually only be 5 or 10% or that. However we have thousands of accounts, most of which only have a few hundred MB to a few GB. But the occasional outlier with 100GB must work just as well.
#7: The different kinds of data we store can be largely separated into two groups.
#10: One internal (batch processing), One external (web app access)
#11: Mostly the time series data So we had to either duplicate lots data and synchronize changes. Or integrate both into one and make sure different parts of the system don’t get in each other’s way.
#12: We opted for the latter because it makes for a simpler system. We just have to make sure So far it turned out well and we havent looked back.
#17: Let’s have a peek into the past in order to understand how the system evolved.
#18: Our CTO is a mathematician Skunk works project
#25: PostgreSQL supports partitioning via inheritance [insert scheme] Use CHECK constraints to tell Query Planner where to look Cannot insert into parent table, must insert into child table Lot of effort goes to application logic Tried it on one table, weren’t it conviced
#30: The database or schema as a logical unit is a central part of PG with good tool support Easy to add, easy to drop Can be Backed up Restored Moved between machines Very Tangible from an ops view
#41: MainDB still replicated To enable quick failover Here we can’t afford extended downtime
#42: Can make availability / cost trade offs here
#43: Big cheap HDDs Bottle neck is Gigabit Ethernet
#44: Capacity doubled, cost reduced 40% The more servers, the faster the restore Gbit Ethernet on backup server is limiting factor
#45: Not really feasible: We rewrite lots of data every day (crude approach, but simpler code) Complex Administration (no dedicated DBA)
#49: From sequential reads to random reads The cause of the problem is only on one side ..
#50: Webapp-queries with humans waiting are quite fast Problematic queries done by the analysis jobs Frequent full table scans Queries with huge results Need way to synchronize queries, control concurrency Could use a connection pooler Or an external synchronization mechanism e.g. Zookeeper
#51: Webapp-queries with humans waiting are quite fast Problematic queries done by the analysis jobs Frequent full table scans Queries with huge results Need way to synchronize queries, control concurrency Could use a connection pooler Or an external synchronization mechanism e.g. Zookeeper
#52: Very simple mechanism Unfair, but that’s no problem
#53: However, it’s starting to spread with a tendency to be mis-used.
#57: An ALTER INDEX foo DISABLE would come in handy;
#61: We added a self-service signup 2-minute process to add AdWords account to the system OAuth  User Info  Optimization Bootstrap Biggest problem: CREATE DATABASE can take several minutes Depends on current amount of write activity More granular checkpoint (per db) would be cool?
#63: Restrict checkpoints to databases?
#64: So all of the drawbacks that came up could be worked around, more or less elegantly. In total, we’re very happy with the way the approach has turned out. Especially the scalability and isolation aspects of it have us very pleased. So much in fact, that we also used it for a second product and it feels very natural.
#65: Databases as a unit of abstraction on a client or account level level are very much tangible. Which makes it comfortable both from an development and operations point of view. They can be connected to, renamed, cloned, copied, moved, backed up and restored. When we remove a customer from the system we just dump the account databases and put them on S3 Glacier for some amount of time, instead of keeping the 100GB in the system.
#66: To manage capacity. Currently this is still a manual process because it’s not required very often. Making it automatic would require amongst other things a means to briefly disable the app from connecting to it. Does “ALTER DATABASE set CONNECTION LIMIT 0” work?
#67: Moving between hosts means we can also move it between PG versions. We upgraded from 9.0 to 9.3 without much effort by installing both on all machines and then dumping them one after another from 9.0 and restoring into 9.3. Over a period of 2-3 months. Memory is not a problem as shared_buffers is relatively low (a few gigabytes) and most memory is used by page and buffer cache and all files continue to exist only once. Used 9.3 in development for a few months Btw, I only remember one case where we needed to adapt code for 9.3, something with the order of a query result. Otherwise the ugprade was a breeze.
#68: But, even though this works very well with using databases as partitions. Would schemas have worked the same way?
#69: The biggest problem we would have had is, that we wouldnt be able to use schemas for other purposes anymore.
#72: This is become necessary
#74: It has grown quite a bit because we started with lots of Perl codeand a “dumb” data store
#75: Up to 100GB memory in step 2
#76: Only works when we can limit concurrent the batch jobs per machine (advisory locks).
#77: But it’s not all sunshine and rainbows with that approach, of course. Because SQL is much harder to write and to read than procedural code. The notion that “code is a liability” has some truth to it. So the more we move into the database, the hard it becomes to manage. Python I just much more tangible and malleable than SQL. We have to compromise between easy to debug&test and performance.
#78: But, given a bit of time and quiet one can accomplish much with little code in SQL. Testing of individual snippets can be done by calling it from the application code, as part of an integrated test suite that has test data and expects certain results. Covering most of the code in tests is not the problem, but covering most data scenarios is much more work (Div by zero sneeks in from time to time). Those cases are postponed to …
#79: The SQL code decides how to spend millions of euros in advertising money every months. We can’t afford deploying any code changes (app or db) to all account databases at the same time. So we use schemas to manage multiple versions of the optimization code.
#80: The schema is filled with all the objects from a set of source .sql files. The application software version that uses the db and schema version are identical, app sets search path. We don’t use minor versions for fixes in the db code.
#81: What we do is, assign each version a stage. Unstable, testing, stable, borrowed from Debian. And we also can assign individual client accounts a stage.
#82: Typically test accounts or one with a pathological case that is fixed by the new release. Those are closely monitored (performance, errors, log files, debugging data). Brand new unstable: few, selected (test-)accounts Testing stage for incremental roll-out

Scaling a SaaS backend with PostgreSQL - A case study

More Related Content

What's hot (20)

Viewers also liked (10)

Similar to Scaling a SaaS backend with PostgreSQL - A case study (20)

Recently uploaded (20)

Scaling a SaaS backend with PostgreSQL - A case study

Editor's Notes