In-memory databases offer
                     significant gains in               But three issues
                     performance as all data is         have stunted their
                     freely available. There is no      uptake: Address
Traditional disk-                                       spaces only being
                     need to page to and from disk.
oriented database                                       large enough for a
architecture is                                         subset of a typical
showing its age.    This makes joins a                  user’s data. The ‘one
                    problem. When data                  more bit’ problem
                    must be joined across               and durability.
                    multiple machines
                    performance degradation
 Snowflake          is inevitable.                    Distributed in-
 Schemas allow                                        memory databases
 us to mix             But this model only goes       solve these three
 Partitioning          so far. “Connected             problems but at the
 and Replication       Replication” takes us a        price of loosing the
 so joins never        step further allowing us       single address space.
 hit the wire.         to make the best possible
                       use of replication.
The lay of the land: 
  The main architectural
constructs in the database
         industry
Shared
 Disk
ms
   μs
       ns
           ps

1MB Disk/Network
        1MB Main Memory


          0.000,000,000,000
Cross Continental    Main Memory
                  L1 Cache Ref
Round Trip
          Ref
         Cross Network             L2 Cache Ref
         Round Trip
       * L1 ref is about 2 clock cycles or 0.7ns. This is
                           the time it takes light to travel 20cm
Distributed Cache
Taken from “OLTP Through
the Looking Glass, and What
We Found There”
Harizopoulos et al
Shared
                                  Nothing            

                                Teradata, Vertica,
                                 Greenplumb…


                                                                            SN
 Regular                        In-Memory
                                                                        In-Memory
Database
                        Database
                  Drop Disk
                                            Exasol, VoltDB,
Oracle, Sybase,
    MySql
                                Times Ten, HSQL,
                                      KDB
                                                          Distribute
       Hana


                                                                    ODC
                                Distributed
                                 Caching
                               Coherence, Gemfire,
                                  Gigaspaces
Distributed Architecture


 Simplify the Contract.


     Stick to RAM
450 processes
      2TB of RAM
                                         Oracle 
                                        Coherence




Messaging (Topic Based) as a system of record
                (persistence)
Access Layer      Java      Java
                    client
   client

                     API
      API
 Query Layer




                                        Transactions
  Data Layer




                                              Mtms

                                           Cashflows
Persistence Layer
Indexing




Partitioning
               Replication
But your storage is limited by
the memory on a node
Keys Fs-Fz
     Keys Xa-Yd




Scalable storage, bandwidth
and processing
Trader
            Party
         Version 1
        Trade
                 Trader
                           Party
         Version 2
                       Trade
                                Trader
                                          Party
         Version 3
                                      Trade
                                               Trader
                                                         Party
   Version 4
                                                     Trade
…and you need
versioning to do MVCC
Trade
            Trader
         Party


         Party
   Trader
Trade


         Party
                  Trader

Trade
         Party
So better to use
partitioning, spreading
   data around the
         cluster.
Trader
                             Party

                         Trade




Trade
   Trader
                      Party
Trader
                             Party

                         Trade




Trade
   Trader
                      Party
!
This is what using Snowflake Schemas and
  the Connected Replication pattern is all
                   about!
Crosscutting
   Keys




 Common
  Keys
Replicated
Trader
                   Party


          Trade
                            Partitioned
Valuation Legs

             Valuations

art Transaction Mapping

     Cashflow Mapping                                                  Facts:
             Party Alias

            Transaction
                                                                       =>Big, 
             Cashflows                                                 common
                   Legs

                 Parties
                                                                       keys
           Ledger Book

           Source Book
                                                                       Dimensions
            Cost Centre

                Product                                                =>Small,
  Risk Organisation Unit

          Business Unit
                                                                       crosscutting 
             HCS Entity                                                Keys
           Set of Books

                           0   37,500,000   75,000,000   112,500,000          150,000,000
Coherence’s
                     KeyAssociation
                      gives us this
Trades
      MTMs



          Common
            Key
Replicated
Trader
                   Party


          Trade
                            Partitioned
                                 (
Query Layer
Trader
           Party

      Trade




                       Transactions




                                       Data Layer
                            Mtms

                         Cashflows



                    Fact Storage
                    (Partitioned)
Dimensions
                   (repliacte)

   Transactions

        Mtms
                      Facts
     Cashflows
                  (distribute/
                    partition)
Fact Storage
(Partitioned)
Valuation Legs

             Valuations



                                                                       Facts:
art Transaction Mapping

     Cashflow Mapping

             Party Alias
                                                                       =>Big
                                                                       =>Distribute
            Transaction

             Cashflows

                   Legs

                 Parties

           Ledger Book

           Source Book                                                 Dimensions
                                                                       =>Small 
            Cost Centre

                Product

  Risk Organisation Unit
                                                                       => Replicate
          Business Unit

             HCS Entity

           Set of Books

                           0   37,500,000   75,000,000   112,500,000         150,000,000
We use a variant on a
   Snowflake Schema to
 partition big stuff, that has
the same key and replicate
     small stuff that has
     crosscutting keys.
Replicate




Distribute
Select Transaction, MTM, ReferenceData From
MTM, Transaction, Ref Where Cost Centre = ‘CC1’
Select Transaction, MTM, ReferenceData From
MTM, Transaction, Ref Where Cost Centre = ‘CC1’


                                      LBs[]=getLedgerBooksFor(CC1)
                                      SBs[]=getSourceBooksFor(LBs[])
                                      So we have all the bottom level
                                      dimensions needed to query facts



                                   Transactions


                                         Mtms


                                     Cashflows



                                Partitioned
Select Transaction, MTM, ReferenceData From
MTM, Transaction, Ref Where Cost Centre = ‘CC1’


                                      LBs[]=getLedgerBooksFor(CC1)
                                      SBs[]=getSourceBooksFor(LBs[])
                                      So we have all the bottom level
                                      dimensions needed to query facts



                                   Transactions

                         Get all Transactions and
                                         Mtms
                         MTMs (cluster side join) for
                         the passed Source Books
                                      Cashflows



                                Partitioned
Select Transaction, MTM, ReferenceData From
                      MTM, Transaction, Ref Where Cost Centre = ‘CC1’


Populate raw facts                                          LBs[]=getLedgerBooksFor(CC1)
(Transactions) with                                         SBs[]=getSourceBooksFor(LBs[])
dimension data
                                                            So we have all the bottom level
before returning to
                                                            dimensions needed to query facts
client.


                                                         Transactions

                                               Get all Transactions and
                                                               Mtms
                                               MTMs (cluster side join) for
                                               the passed Source Books
                                                            Cashflows



                                                      Partitioned
Replicated                  Partitioned
                  Java
                  client


Dimensions
                   Facts
                   API




We never have to do a distributed join!
So all the big stuff is
  held paritioned



   And we can join
without shipping keys
  around and having
 intermediate results
Trader
                             Party

                         Trade




Trade
   Trader
                      Party
Trader
          Party
         Version 1
      Trade
               Trader
                         Party
         Version 2
                     Trade
                              Trader
                                        Party
         Version 3
                                    Trade
                                             Trader
                                                       Party
   Version 4
                                                   Trade
Trade
            Trader
         Party


         Party
   Trader
Trade


         Party
                  Trader

Trade
         Party
Valuation Legs

            Valuations

rt Transaction Mapping

    Cashflow Mapping

            Party Alias
                                                       Facts
           Transaction

            Cashflows

                  Legs

                Parties       This is a dimension
          Ledger Book
                               •  It has a different
          Source Book

           Cost Centre            key to the Facts.
   Dimensions
               Product
                               •  And it’s BIG
 Risk Organisation Unit

         Business Unit

            HCS Entity

          Set of Books

                          0                                     125,000,000
Party Alias



               Parties



         Ledger Book



         Source Book



          Cost Centre



              Product



Risk Organisation Unit



        Business Unit



           HCS Entity



         Set of Books


                         0   1,250,000   2,500,000   3,750,000   5,000,000
Party Alias



               Parties



         Ledger Book



         Source Book



          Cost Centre



              Product



Risk Organisation Unit



        Business Unit



           HCS Entity



         Set of Books


                         20   1,250,015   2,500,010   3,750,005   5,000,000
So we only replicate
‘Connected’ or ‘Used’
     dimensions
Processing Layer
                       Dimension Caches
                          (Replicated)


                                    Transactions




                                                     Data Layer
As new Facts are added                    Mtms
relevant Dimensions that
they reference are moved
                                      Cashflows
to processing layer caches


                                 Fact Storage
                                 (Partitioned)
Query Layer
     Save Trade
                     (With connected
                                     dimension Caches)

                                     Data Layer
Cache
              Trade
                 (All Normalised)
Store

                                            Partitioned 
              Trigger
   Source             Cache
     Party                        Ccy
     Alias
               Book
Query Layer
                              (With connected
                              dimension Caches)

                              Data Layer
         Trade
               (All Normalised)



Party             Source   Ccy
Alias
             Book
Query Layer
                                        (With connected
                                        dimension Caches)

                                        Data Layer
         Trade
                         (All Normalised)



Party              Source            Ccy
Alias
              Book
 

          Party
            Ledger
           
                 Book
‘Connected Replication’
    A simple pattern which
recurses through the foreign
 keys in the domain model,
 ensuring only ‘Connected’
  dimensions are replicated
Java
               client

Java schema
    API
     Java ‘Stored
                          Procedures’
                         and ‘Triggers’
Partitioned
 Storage
Balancing Replication and Partitioning in a Distributed Java Database
Balancing Replication and Partitioning in a Distributed Java Database

Balancing Replication and Partitioning in a Distributed Java Database

  • 2.
    In-memory databases offer significant gains in But three issues performance as all data is have stunted their freely available. There is no uptake: Address Traditional disk- spaces only being need to page to and from disk. oriented database large enough for a architecture is subset of a typical showing its age. This makes joins a user’s data. The ‘one problem. When data more bit’ problem must be joined across and durability. multiple machines performance degradation Snowflake is inevitable. Distributed in- Schemas allow memory databases us to mix But this model only goes solve these three Partitioning so far. “Connected problems but at the and Replication Replication” takes us a price of loosing the so joins never step further allowing us single address space. hit the wire. to make the best possible use of replication.
  • 5.
    The lay ofthe land: The main architectural constructs in the database industry
  • 7.
  • 10.
    ms μs ns ps 1MB Disk/Network 1MB Main Memory 0.000,000,000,000 Cross Continental Main Memory L1 Cache Ref Round Trip Ref Cross Network L2 Cache Ref Round Trip * L1 ref is about 2 clock cycles or 0.7ns. This is the time it takes light to travel 20cm
  • 19.
  • 21.
    Taken from “OLTPThrough the Looking Glass, and What We Found There” Harizopoulos et al
  • 23.
    Shared Nothing Teradata, Vertica, Greenplumb… SN Regular In-Memory In-Memory Database Database Drop Disk Exasol, VoltDB, Oracle, Sybase, MySql Times Ten, HSQL, KDB Distribute Hana ODC Distributed Caching Coherence, Gemfire, Gigaspaces
  • 24.
    Distributed Architecture Simplifythe Contract. Stick to RAM
  • 25.
    450 processes 2TB of RAM Oracle Coherence Messaging (Topic Based) as a system of record (persistence)
  • 26.
    Access Layer Java Java client client API API Query Layer Transactions Data Layer Mtms Cashflows Persistence Layer
  • 27.
  • 28.
    But your storageis limited by the memory on a node
  • 29.
    Keys Fs-Fz Keys Xa-Yd Scalable storage, bandwidth and processing
  • 33.
    Trader Party Version 1 Trade Trader Party Version 2 Trade Trader Party Version 3 Trade Trader Party Version 4 Trade …and you need versioning to do MVCC
  • 34.
    Trade Trader Party Party Trader Trade Party Trader Trade Party
  • 35.
    So better touse partitioning, spreading data around the cluster.
  • 36.
    Trader Party Trade Trade Trader Party
  • 37.
    Trader Party Trade Trade Trader Party
  • 41.
    ! This is whatusing Snowflake Schemas and the Connected Replication pattern is all about!
  • 44.
    Crosscutting Keys Common Keys
  • 45.
    Replicated Trader Party Trade Partitioned
  • 52.
    Valuation Legs Valuations art Transaction Mapping Cashflow Mapping Facts: Party Alias Transaction =>Big, Cashflows common Legs Parties keys Ledger Book Source Book Dimensions Cost Centre Product =>Small, Risk Organisation Unit Business Unit crosscutting HCS Entity Keys Set of Books 0 37,500,000 75,000,000 112,500,000 150,000,000
  • 54.
    Coherence’s KeyAssociation gives us this Trades MTMs Common Key
  • 55.
    Replicated Trader Party Trade Partitioned (
  • 56.
    Query Layer Trader Party Trade Transactions Data Layer Mtms Cashflows Fact Storage (Partitioned)
  • 57.
    Dimensions (repliacte) Transactions Mtms Facts Cashflows (distribute/ partition) Fact Storage (Partitioned)
  • 58.
    Valuation Legs Valuations Facts: art Transaction Mapping Cashflow Mapping Party Alias =>Big =>Distribute Transaction Cashflows Legs Parties Ledger Book Source Book Dimensions =>Small Cost Centre Product Risk Organisation Unit => Replicate Business Unit HCS Entity Set of Books 0 37,500,000 75,000,000 112,500,000 150,000,000
  • 59.
    We use avariant on a Snowflake Schema to partition big stuff, that has the same key and replicate small stuff that has crosscutting keys.
  • 60.
  • 61.
    Select Transaction, MTM,ReferenceData From MTM, Transaction, Ref Where Cost Centre = ‘CC1’
  • 63.
    Select Transaction, MTM,ReferenceData From MTM, Transaction, Ref Where Cost Centre = ‘CC1’ LBs[]=getLedgerBooksFor(CC1) SBs[]=getSourceBooksFor(LBs[]) So we have all the bottom level dimensions needed to query facts Transactions Mtms Cashflows Partitioned
  • 64.
    Select Transaction, MTM,ReferenceData From MTM, Transaction, Ref Where Cost Centre = ‘CC1’ LBs[]=getLedgerBooksFor(CC1) SBs[]=getSourceBooksFor(LBs[]) So we have all the bottom level dimensions needed to query facts Transactions Get all Transactions and Mtms MTMs (cluster side join) for the passed Source Books Cashflows Partitioned
  • 66.
    Select Transaction, MTM,ReferenceData From MTM, Transaction, Ref Where Cost Centre = ‘CC1’ Populate raw facts LBs[]=getLedgerBooksFor(CC1) (Transactions) with SBs[]=getSourceBooksFor(LBs[]) dimension data So we have all the bottom level before returning to dimensions needed to query facts client. Transactions Get all Transactions and Mtms MTMs (cluster side join) for the passed Source Books Cashflows Partitioned
  • 68.
    Replicated Partitioned Java client Dimensions Facts API We never have to do a distributed join!
  • 69.
    So all thebig stuff is held paritioned And we can join without shipping keys around and having intermediate results
  • 70.
    Trader Party Trade Trade Trader Party
  • 71.
    Trader Party Version 1 Trade Trader Party Version 2 Trade Trader Party Version 3 Trade Trader Party Version 4 Trade
  • 72.
    Trade Trader Party Party Trader Trade Party Trader Trade Party
  • 78.
    Valuation Legs Valuations rt Transaction Mapping Cashflow Mapping Party Alias Facts Transaction Cashflows Legs Parties This is a dimension Ledger Book •  It has a different Source Book Cost Centre key to the Facts. Dimensions Product •  And it’s BIG Risk Organisation Unit Business Unit HCS Entity Set of Books 0 125,000,000
  • 83.
    Party Alias Parties Ledger Book Source Book Cost Centre Product Risk Organisation Unit Business Unit HCS Entity Set of Books 0 1,250,000 2,500,000 3,750,000 5,000,000
  • 84.
    Party Alias Parties Ledger Book Source Book Cost Centre Product Risk Organisation Unit Business Unit HCS Entity Set of Books 20 1,250,015 2,500,010 3,750,005 5,000,000
  • 86.
    So we onlyreplicate ‘Connected’ or ‘Used’ dimensions
  • 87.
    Processing Layer Dimension Caches (Replicated) Transactions Data Layer As new Facts are added Mtms relevant Dimensions that they reference are moved Cashflows to processing layer caches Fact Storage (Partitioned)
  • 89.
    Query Layer Save Trade (With connected dimension Caches) Data Layer Cache Trade (All Normalised) Store Partitioned Trigger Source Cache Party Ccy Alias Book
  • 90.
    Query Layer (With connected dimension Caches) Data Layer Trade (All Normalised) Party Source Ccy Alias Book
  • 91.
    Query Layer (With connected dimension Caches) Data Layer Trade (All Normalised) Party Source Ccy Alias Book Party Ledger Book
  • 92.
    ‘Connected Replication’ A simple pattern which recurses through the foreign keys in the domain model, ensuring only ‘Connected’ dimensions are replicated
  • 94.
    Java client Java schema API Java ‘Stored Procedures’ and ‘Triggers’
  • 97.

Editor's Notes

  • #46 Big data sets are held distributed and only joined on the grid to collocated objects. Small data sets are held in replicated caches so they can be joined in process (only ‘active’ data is held)
  • #56 Big data sets are held distributed and only joined on the grid to collocated objects. Small data sets are held in replicated caches so they can be joined in process (only ‘active’ data is held)