Data Storage for Extreme
Use Cases: The Lay of the
Land and a Peek at ODC
Ben Stopford : RBS
How fast is a HashMap lookup?
That‟s how long it takes light to
         travel a room
How fast is a database lookup?
That‟s how long it takes light to
   go to Australia and back
Computers really are very fast!
The problem is we‟re quite good at
 writing software that slows them
               down
Question:

Is it fair to compare the
performance of a Database with
a HashMap?
Of course not…
Mechanical Sympathy

                      Ethernet ping
  1MB Disk/Ethernet
                                      RDMA over Infiniband
Cross
Continental      ms      μs        ns              ps
Round
Trip
              0.000,000,000,000
                         Main Memory                      L1 Cache Ref
                         Ref

          1MB Main Memory              L2 Cache Ref
                               * L1 ref is about 2 clock cycles or 0.7ns. This is the
                               time it takes light to travel 20cm
Key Point #1




    Simple computer
programs, operating in a
single address space are
     extremely fast.
Why are there so many
types of database
these days?
…because we need
different architectures
for different jobs
Times are changing
Traditional Database
Architecture is Aging
The Traditional Architecture
Traditional



       Shared                 Shared
                In Memory
        Disk                  Nothing


Distributed                       Simpler
In Memory                         Contract
Key Point #2


  Different architectural
 decisions about how we
store and access data are
    needed in different
      environments.
Our ‘Context’ has changed
Simplifying the
   Contract
How big is the internet?



   5 exabytes
(which is 5,000 petabytes
 or 5,000,000 terabytes)
How big is an average
 enterprise database


80% < 1TB
      (in 2009)
The context of
our problem has
    changed
Simplifying the Contract
Databases have huge
operational overheads




                    Taken from “OLTP Through the
                    Looking Glass, and What We Found
                    There” Harizopoulos et al
Avoid that overhead with a
simpler contract and avoiding
IO
Key Point #3



For the very top end data
   volumes a simpler
 contract is mandatory.
   ACID is simply not
        possible.
Key Point #3 (addendum)




 But we should always
retain ACID properties if
 our use case allows it.
Options for
scaling-out the
  traditional
 architecture
#1: The Shared Disk
    Architecture




                Shared
                 Disk
#2: The Shared Nothing
      Architecture
Each machine is responsible for a subset
of the records. Each record exists on only
               one machine.


                   1, 2, 3…   97, 98, 99…




           765, 769…                 169, 170…
 Client

                  333, 334…   244, 245…
#3: The In Memory Database
   (single address-space)
Databases must cache subsets
    of the data in memory




            Cache
Not knowing what you don‟t
          know



      90% in Cache


        Data on Disk
If you can fit it ALL in memory
you know everything!!
The architecture of an in
   memory database
Memory is at least 100x faster than
               disk
                    ms          μs           ns             ps

 1MB Disk/Network                  1MB Main Memory


           0.000,000,000,000
Cross Continental               Main Memory                        L1 Cache Ref
Round Trip                      Ref
          Cross Network Round          L2 Cache Ref
          Trip               * L1 ref is about 2 clock cycles or 0.7ns. This is the
                                        time it takes light to travel 20cm
Random vs. Sequential Access
This makes them very fast!!
The proof is in the stats. TPC-H
Benchmarks on a 1TB data set
So why haven‟t in-
memory databases taken
        off?
Address-Spaces are relatively
small and of a finite, fixed size
Durability
One solution is
 distribution
Distributed In Memory (Shared
           Nothing)
Again we spread our data but this time
          only using RAM.




                 1, 2, 3…   97, 98, 99…




         765, 769…                 169, 170…
Client

                333, 334…   244, 245…
Distribution solves our two
         problems
We get massive amounts of
   parallel processing
But at the cost
of loosing the
single address
     space
Traditional



       Shared                 Shared
                In Memory
        Disk                  Nothing


Distributed                       Simpler
In Memory                         Contract
Key Point #4
   There are three key forces:

                 Simplify the
Distribution                     No Disk
                   contract

                   Improve
  Gain
                   scalability
  scalability                    All data is
                   by picking
  through a                      held in
                   appropriate
  distributed                    RAM
                   ACID
  architecture
                   properties.
These three non-
 functional themes
lay behind the design
  of ODC, RBS‟s in-
     memory data
      warehouse
ODC
ODC represents
   a balance
    between
throughput and
    latency
What is Latency?
What is Throughput
Which is best for latency?


                 Shared
                 Nothing
               (Distributed)
 Traditional                   In-Memory
 Database                       Database



                Latency?
Which is best for throughput?


                   Shared
                   Nothing
                 (Distributed)
   Traditional                   In-Memory
   Database                       Database



                  Throughput?
So why do we use distributed
        in-memory?
         In        Plentiful
       Memory     hardware




        Latency   Throughput
ODC – Distributed, Shared Nothing, In
Memory, Semi-Normalised, Realtime Graph
                   DB
      450 processes         2TB of RAM




  Messaging (Topic Based) as a system of record
                    (persistence)
The Layers
Access Layer   Jav    Jav
                a      a
               clie   clie
                nt
               API     nt
                      API
Query Layer




                             Transaction
Data Layer




                             s
                                  Mtms

                               Cashflows
Persistence
  Layer
Three Tools of Distributed Data
         Architecture
                   Indexing




    Partitioning              Replication
How should we use these tools?
Replication puts data
     everywhere




      But your storage is limited by
      the memory on a node
Partitioning scales
     Associating data in
     different partitions implies
     moving it.




     Scalable storage, bandwidth
     and processing
So we have some data.
Our data is bound together in a
             model
      Desk
                         Sub
                                 Name
        Trader
                         Party

                 Trade
Which we save..


                   Trade
                     r            Part
                                   y
                           Trad
                            e




Trad       Trade                         Part
 e           r
                                          y
Binding them back together involves a
 “distributed join” => Lots of network
                  hops
                      Trade
                        r            Part
                                      y
                              Trad
                               e




   Trad       Trade                         Part
    e           r
                                             y
The hops have to be spread
        over time




        Network
                     Time
Lots of network hops makes it
            slow
OK – what if we held it
all together??
“Denormalised”
Hence denormalisation is FAST!
           (for reads)
Denormalisation implies the
duplication of some sub-entities
…and that means managing
consistency over lots of copies
…and all the duplication means
 you run out of space really
            quickly
Spaces issues are exaggerated
  further when data is versioned
  Trade
    r            Part        Version 1
                  y
          Trad
           e       Trade
                     r            Part        Version 2
                                   y
                           Trad
                            e       Trade
                                      r            Part        Version 3
                                                    y
                                            Trad
                                             e       Trade
                                                       r            Part   Version 4
                                                                     y
                                                             Trad
…and you need                                                 e

versioning to do MVCC
And reconstituting a previous
  time slice becomes very
           difficult.
   Trad          Trade
          Part
    e              r
           y

          Part   Trade
   Trad    y       r
    e
          Part
           y     Trade
                   r
   Trad
    e     Part
           y
So we want to hold
   entities separately
(normalised) to alleviate
    concerns around
 consistency and space
          usage
Remember this means the
  object graph will be split across
          multiple machines. Data is
Independently
 Versioned            Trade
                                            Singleton
                        r            Part
                                      y
                              Trad
                               e




     Trad     Trade                          Part
      e         r
                                              y
Binding them back together involves a
 “distributed join” => Lots of network
                  hops
                      Trade
                        r            Part
                                      y
                              Trad
                               e




   Trad       Trade                         Part
    e           r
                                             y
Whereas the denormalised
model the join is already done
So what we want is the advantages
of a normalised store at the speed
of a denormalised one!


This is what using Snowflake Schemas and
  the Connected Replication pattern is all
                   about!
Looking more closely: Why
 does normalisation mean we
have to spread data around the
cluster. Why can‟t we hold it all
           together?
It‟s all about the keys
We can collocate data with common keys but if
 they crosscut the only way to collocate is to
                   replicate
    Crosscuttin
        g
       Keys




     Common
      Keys
We tackle this problem with a
       hybrid model:

                                  Replicated
      Trader
                       Party


               Trade
                               Partitioned
We adapt the concept of a
  Snowflake Schema.
Taking the concept of Facts and
          Dimensions
Everything starts from a Core
    Fact (Trades for us)
Facts are   Big, dimensions are
                small
Facts have one key that relates
  them all (used to partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data:


                       Facts:
                       =>Big,
                       common
                       keys

                       Dimensions
                       =>Small,
                       crosscutting
                       Keys
We remember we are a grid. We
 should avoid the distributed
            join.
… so we only want to „join‟ data
  that is in the same process
                        Use a Key
                        Assignment
   Trade
                          Policy
              MTMs     (e.g. KeyAssociation
     s
                          in Coherence)

           Common
             Key
So we prescribe different
physical storage for Facts and
         Dimensions
                                   Replicated
       Trader
                        Party


                Trade
                                Partitioned
Facts are
partitioned, dimensions are
replicated




                                          Query Layer
  Trader
               Party

           Trade




                           Transactions




                                           Data Layer
                                Mtms

                             Cashflows



                        Fact Storage
                        (Partitioned)
Facts are
partitioned, dimensions are
replicated
                        Dimension
                             s
                        (repliacte)
                  Transactions
                                    Facts
                       Mtms

                    Cashflows
                                 (distribute/
                                  partition)
               Fact Storage
               (Partitioned)
The data volumes back this up
  as a sensible hypothesis

                        Facts:
                        =>Big
                        =>Distribut
                             e

                        Dimensions
                        =>Small
                        => Replicate
Key Point


   We use a variant on a
   Snowflake Schema to
partition big entities that can
be related via a partitioning
key and replicate small stuff
who’s keys can’t map to our
        partitioning key.
Replicate




Distribute
So how does they help us to run
  queries without distributed
            joins?
  Select Transaction, MTM,
  RefrenceData From MTM,
  Transaction, Ref Where Cost Centre
  = ‘CC1’
What would this look like
          without this pattern?
 Get      Get    Get    Get      Get   Get      Get
 Cost    Ledger Source Transa   MTMs   Legs     Cost
Center   Books Books c-tions                   Center
  s                                              s




                     Network
                                              Time
But by balancing Replication and
Partitioning we don‟t need all those hops

     Get       Get      Get     Get    Get   Get     Get
     Cost     Ledger   Source Transac MTMs   Legs    Cost
    Centers   Books    Books -tions                 Centers




                            Network
Stage 1: Focus on the where
          clause:
    Where Cost Centre = „CC1‟
Stage 1: Get the right keys to
      query the Facts
    Select Transaction, MTM, ReferenceData From
    MTM, Transaction, Ref Where Cost Centre =
    ‘CC1’


                                       Join
                                       Dimensions in
                                       Query Layer
                                    Transactions


                                          Mtms


                                      Cashflows



                                  Partitioned
Stage 2: Cluster Join to get
           Facts
   Select Transaction, MTM, ReferenceData From
   MTM, Transaction, Ref Where Cost Centre =
   ‘CC1’


                                      Join
                                      Dimensions in
                                      Query Layer
                                   Transactions


                          Join Facts     Mtms


                          acrossCashflows
                                 cluster

                                 Partitioned
Stage 2: Join the facts together
efficiently as we know they are
            collocated
Stage 3: Augment raw Facts
      with relevant Dimensions
             Select Transaction, MTM, ReferenceData From
             MTM, Transaction, Ref Where Cost Centre =
             ‘CC1’

Join                                            Join Dimensions
Dimensions                                      in Query Layer
in Query
Layer

                                             Transactions


                                    Join FactsMtms

                                    across Cashflows
                                           cluster

                                           Partitioned
Stage 3: Bind relevant
dimensions to the result
Bringing it together:
                  Jav
Replicated         a
                  clie
                           Partitioned
                   nt
                  API
Dimensions                   Facts




We never have to do a distributed join!
So all the big stuff is
  held partitioned



 And we can join
 without shipping
 keys around and
having intermediate
      results
We get to do this…


                    Trade
                      r            Part
                                    y
                            Trad
                             e




Trad        Trade                         Part
 e            r
                                           y
…and this…


Trade
  r            Part        Version 1
                y
        Trad
         e       Trade
                   r            Part        Version 2
                                 y
                         Trad
                          e       Trade
                                    r            Part        Version 3
                                                  y
                                          Trad
                                           e       Trade
                                                     r            Part   Version 4
                                                                   y
                                                           Trad
                                                            e
..and this..


Trad          Trade
       Part
 e              r
        y

       Part   Trade
Trad    y       r
 e
       Part
        y     Trade
                r
Trad
 e     Part
        y
…without the problems of this…
…or this..
..all at the speed of this… well
              almost!
But there is a fly in the
      ointment…
I lied earlier. These aren‟t all
             Facts.

                              Facts


     This is a dimension
       • It has a different
         key to the Facts.    Dimensions
       • And it’s BIG
We can‟t replicate really big
stuff… we‟ll run out of space
 => Big Dimensions are a
problem.
Fortunately there is a simple
solution!
Whilst there are lots of these
big dimensions, a large majority
are never used. They are not all
“connected”.
If there are no Trades for Goldmans
in the data store then a Trade Query
will never need the Goldmans
Counterparty
Looking at the Dimension data
    some are quite large
But Connected Dimension Data
    is tiny by comparison
One recent independent study
from the database community
showed that 80% of data
remains unused
So we only replicate
‘Connected’ or ‘Used’
     dimensions
As data is written to the data store we
keep our „Connected Caches‟ up to date




                                                          Processing Layer
                             Dimension
                              Caches
                            (Replicated)
                                           Transactions




                                                            Data Layer
     As new Facts are added                     Mtms
     relevant Dimensions that
     they reference are moved
                                             Cashflows
     to processing layer
     caches

                                    Fact Storage
                                    (Partitioned)
The Replicated Layer is updated
by recursing through the arcs
on the domain model when facts
change
Saving a trade causes all it‟s     1
                                   levelst

        references to be triggered

                                Query Layer
    Save Trade                  (With connected
                                dimension Caches)

                                Data Layer
Cache
            Trad                (All Normalised)
Store         e

                                      Partitioned
            Trigger                   Cache
    Party             Sourc   Ccy
    Alias               e
                      Book
This updates the connected caches


                         Query Layer
                         (With connected
                         dimension Caches)

                         Data Layer
        Trad             (All Normalised)
          e



Party          Sourc   Ccy
Alias            e
               Book
The process recurses through the
          object graph

                                  Query Layer
                                  (With connected
                                  dimension Caches)

                                  Data Layer
        Trad                      (All Normalised)
          e



Party           Sourc           Ccy
Alias             e
                Book


        Party           Ledge
                        rBook
‘Connected Replication’
   A simple pattern which
recurses through the foreign
     keys in the domain
    model, ensuring only
‘Connected’ dimensions are
         replicated
With ‘Connected
  Replication’ only
  1/10th of the data

needs to be replicated
    (on average).
Limitations of this approach
Conclusion
Conclusion
Conclusion
Conclusion
Conclusion
Conclusion




       Partitioned
        Storage
Conclusion
The End

Advanced databases ben stopford

  • 1.
    Data Storage forExtreme Use Cases: The Lay of the Land and a Peek at ODC Ben Stopford : RBS
  • 2.
    How fast isa HashMap lookup?
  • 3.
    That‟s how longit takes light to travel a room
  • 4.
    How fast isa database lookup?
  • 5.
    That‟s how longit takes light to go to Australia and back
  • 7.
  • 8.
    The problem iswe‟re quite good at writing software that slows them down
  • 9.
    Question: Is it fairto compare the performance of a Database with a HashMap?
  • 10.
  • 11.
    Mechanical Sympathy Ethernet ping 1MB Disk/Ethernet RDMA over Infiniband Cross Continental ms μs ns ps Round Trip 0.000,000,000,000 Main Memory L1 Cache Ref Ref 1MB Main Memory L2 Cache Ref * L1 ref is about 2 clock cycles or 0.7ns. This is the time it takes light to travel 20cm
  • 12.
    Key Point #1 Simple computer programs, operating in a single address space are extremely fast.
  • 14.
    Why are thereso many types of database these days? …because we need different architectures for different jobs
  • 15.
  • 16.
  • 18.
  • 19.
    Traditional Shared Shared In Memory Disk Nothing Distributed Simpler In Memory Contract
  • 20.
    Key Point #2 Different architectural decisions about how we store and access data are needed in different environments. Our ‘Context’ has changed
  • 21.
  • 22.
    How big isthe internet? 5 exabytes (which is 5,000 petabytes or 5,000,000 terabytes)
  • 23.
    How big isan average enterprise database 80% < 1TB (in 2009)
  • 24.
    The context of ourproblem has changed
  • 25.
  • 26.
    Databases have huge operationaloverheads Taken from “OLTP Through the Looking Glass, and What We Found There” Harizopoulos et al
  • 27.
    Avoid that overheadwith a simpler contract and avoiding IO
  • 28.
    Key Point #3 Forthe very top end data volumes a simpler contract is mandatory. ACID is simply not possible.
  • 29.
    Key Point #3(addendum) But we should always retain ACID properties if our use case allows it.
  • 30.
    Options for scaling-out the traditional architecture
  • 31.
    #1: The SharedDisk Architecture Shared Disk
  • 32.
    #2: The SharedNothing Architecture
  • 33.
    Each machine isresponsible for a subset of the records. Each record exists on only one machine. 1, 2, 3… 97, 98, 99… 765, 769… 169, 170… Client 333, 334… 244, 245…
  • 34.
    #3: The InMemory Database (single address-space)
  • 35.
    Databases must cachesubsets of the data in memory Cache
  • 36.
    Not knowing whatyou don‟t know 90% in Cache Data on Disk
  • 37.
    If you canfit it ALL in memory you know everything!!
  • 38.
    The architecture ofan in memory database
  • 39.
    Memory is atleast 100x faster than disk ms μs ns ps 1MB Disk/Network 1MB Main Memory 0.000,000,000,000 Cross Continental Main Memory L1 Cache Ref Round Trip Ref Cross Network Round L2 Cache Ref Trip * L1 ref is about 2 clock cycles or 0.7ns. This is the time it takes light to travel 20cm
  • 40.
  • 41.
    This makes themvery fast!!
  • 42.
    The proof isin the stats. TPC-H Benchmarks on a 1TB data set
  • 43.
    So why haven‟tin- memory databases taken off?
  • 44.
    Address-Spaces are relatively smalland of a finite, fixed size
  • 45.
  • 46.
    One solution is distribution
  • 47.
    Distributed In Memory(Shared Nothing)
  • 48.
    Again we spreadour data but this time only using RAM. 1, 2, 3… 97, 98, 99… 765, 769… 169, 170… Client 333, 334… 244, 245…
  • 49.
  • 50.
    We get massiveamounts of parallel processing
  • 51.
    But at thecost of loosing the single address space
  • 52.
    Traditional Shared Shared In Memory Disk Nothing Distributed Simpler In Memory Contract
  • 53.
    Key Point #4 There are three key forces: Simplify the Distribution No Disk contract Improve Gain scalability scalability All data is by picking through a held in appropriate distributed RAM ACID architecture properties.
  • 54.
    These three non- functional themes lay behind the design of ODC, RBS‟s in- memory data warehouse
  • 55.
  • 56.
    ODC represents a balance between throughput and latency
  • 57.
  • 58.
  • 59.
    Which is bestfor latency? Shared Nothing (Distributed) Traditional In-Memory Database Database Latency?
  • 60.
    Which is bestfor throughput? Shared Nothing (Distributed) Traditional In-Memory Database Database Throughput?
  • 61.
    So why dowe use distributed in-memory? In Plentiful Memory hardware Latency Throughput
  • 62.
    ODC – Distributed,Shared Nothing, In Memory, Semi-Normalised, Realtime Graph DB 450 processes 2TB of RAM Messaging (Topic Based) as a system of record (persistence)
  • 63.
    The Layers Access Layer Jav Jav a a clie clie nt API nt API Query Layer Transaction Data Layer s Mtms Cashflows Persistence Layer
  • 64.
    Three Tools ofDistributed Data Architecture Indexing Partitioning Replication
  • 65.
    How should weuse these tools?
  • 66.
    Replication puts data everywhere But your storage is limited by the memory on a node
  • 67.
    Partitioning scales Associating data in different partitions implies moving it. Scalable storage, bandwidth and processing
  • 68.
    So we havesome data. Our data is bound together in a model Desk Sub Name Trader Party Trade
  • 69.
    Which we save.. Trade r Part y Trad e Trad Trade Part e r y
  • 70.
    Binding them backtogether involves a “distributed join” => Lots of network hops Trade r Part y Trad e Trad Trade Part e r y
  • 71.
    The hops haveto be spread over time Network Time
  • 72.
    Lots of networkhops makes it slow
  • 73.
    OK – whatif we held it all together?? “Denormalised”
  • 74.
    Hence denormalisation isFAST! (for reads)
  • 75.
  • 76.
    …and that meansmanaging consistency over lots of copies
  • 77.
    …and all theduplication means you run out of space really quickly
  • 78.
    Spaces issues areexaggerated further when data is versioned Trade r Part Version 1 y Trad e Trade r Part Version 2 y Trad e Trade r Part Version 3 y Trad e Trade r Part Version 4 y Trad …and you need e versioning to do MVCC
  • 79.
    And reconstituting aprevious time slice becomes very difficult. Trad Trade Part e r y Part Trade Trad y r e Part y Trade r Trad e Part y
  • 80.
    So we wantto hold entities separately (normalised) to alleviate concerns around consistency and space usage
  • 81.
    Remember this meansthe object graph will be split across multiple machines. Data is Independently Versioned Trade Singleton r Part y Trad e Trad Trade Part e r y
  • 82.
    Binding them backtogether involves a “distributed join” => Lots of network hops Trade r Part y Trad e Trad Trade Part e r y
  • 83.
    Whereas the denormalised modelthe join is already done
  • 84.
    So what wewant is the advantages of a normalised store at the speed of a denormalised one! This is what using Snowflake Schemas and the Connected Replication pattern is all about!
  • 85.
    Looking more closely:Why does normalisation mean we have to spread data around the cluster. Why can‟t we hold it all together?
  • 86.
  • 87.
    We can collocatedata with common keys but if they crosscut the only way to collocate is to replicate Crosscuttin g Keys Common Keys
  • 88.
    We tackle thisproblem with a hybrid model: Replicated Trader Party Trade Partitioned
  • 89.
    We adapt theconcept of a Snowflake Schema.
  • 90.
    Taking the conceptof Facts and Dimensions
  • 91.
    Everything starts froma Core Fact (Trades for us)
  • 92.
    Facts are Big, dimensions are small
  • 93.
    Facts have onekey that relates them all (used to partition)
  • 94.
    Dimensions have manykeys (which crosscut the partitioning key)
  • 95.
    Looking at thedata: Facts: =>Big, common keys Dimensions =>Small, crosscutting Keys
  • 96.
    We remember weare a grid. We should avoid the distributed join.
  • 97.
    … so weonly want to „join‟ data that is in the same process Use a Key Assignment Trade Policy MTMs (e.g. KeyAssociation s in Coherence) Common Key
  • 98.
    So we prescribedifferent physical storage for Facts and Dimensions Replicated Trader Party Trade Partitioned
  • 99.
    Facts are partitioned, dimensionsare replicated Query Layer Trader Party Trade Transactions Data Layer Mtms Cashflows Fact Storage (Partitioned)
  • 100.
    Facts are partitioned, dimensionsare replicated Dimension s (repliacte) Transactions Facts Mtms Cashflows (distribute/ partition) Fact Storage (Partitioned)
  • 101.
    The data volumesback this up as a sensible hypothesis Facts: =>Big =>Distribut e Dimensions =>Small => Replicate
  • 102.
    Key Point We use a variant on a Snowflake Schema to partition big entities that can be related via a partitioning key and replicate small stuff who’s keys can’t map to our partitioning key.
  • 103.
  • 104.
    So how doesthey help us to run queries without distributed joins? Select Transaction, MTM, RefrenceData From MTM, Transaction, Ref Where Cost Centre = ‘CC1’
  • 105.
    What would thislook like without this pattern? Get Get Get Get Get Get Get Cost Ledger Source Transa MTMs Legs Cost Center Books Books c-tions Center s s Network Time
  • 106.
    But by balancingReplication and Partitioning we don‟t need all those hops Get Get Get Get Get Get Get Cost Ledger Source Transac MTMs Legs Cost Centers Books Books -tions Centers Network
  • 107.
    Stage 1: Focuson the where clause: Where Cost Centre = „CC1‟
  • 108.
    Stage 1: Getthe right keys to query the Facts Select Transaction, MTM, ReferenceData From MTM, Transaction, Ref Where Cost Centre = ‘CC1’ Join Dimensions in Query Layer Transactions Mtms Cashflows Partitioned
  • 109.
    Stage 2: ClusterJoin to get Facts Select Transaction, MTM, ReferenceData From MTM, Transaction, Ref Where Cost Centre = ‘CC1’ Join Dimensions in Query Layer Transactions Join Facts Mtms acrossCashflows cluster Partitioned
  • 110.
    Stage 2: Jointhe facts together efficiently as we know they are collocated
  • 111.
    Stage 3: Augmentraw Facts with relevant Dimensions Select Transaction, MTM, ReferenceData From MTM, Transaction, Ref Where Cost Centre = ‘CC1’ Join Join Dimensions Dimensions in Query Layer in Query Layer Transactions Join FactsMtms across Cashflows cluster Partitioned
  • 112.
    Stage 3: Bindrelevant dimensions to the result
  • 113.
    Bringing it together: Jav Replicated a clie Partitioned nt API Dimensions Facts We never have to do a distributed join!
  • 114.
    So all thebig stuff is held partitioned And we can join without shipping keys around and having intermediate results
  • 115.
    We get todo this… Trade r Part y Trad e Trad Trade Part e r y
  • 116.
    …and this… Trade r Part Version 1 y Trad e Trade r Part Version 2 y Trad e Trade r Part Version 3 y Trad e Trade r Part Version 4 y Trad e
  • 117.
    ..and this.. Trad Trade Part e r y Part Trade Trad y r e Part y Trade r Trad e Part y
  • 118.
  • 119.
  • 120.
    ..all at thespeed of this… well almost!
  • 122.
    But there isa fly in the ointment…
  • 123.
    I lied earlier.These aren‟t all Facts. Facts This is a dimension • It has a different key to the Facts. Dimensions • And it’s BIG
  • 124.
    We can‟t replicatereally big stuff… we‟ll run out of space => Big Dimensions are a problem.
  • 125.
    Fortunately there isa simple solution!
  • 126.
    Whilst there arelots of these big dimensions, a large majority are never used. They are not all “connected”.
  • 127.
    If there areno Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
  • 128.
    Looking at theDimension data some are quite large
  • 129.
    But Connected DimensionData is tiny by comparison
  • 130.
    One recent independentstudy from the database community showed that 80% of data remains unused
  • 131.
    So we onlyreplicate ‘Connected’ or ‘Used’ dimensions
  • 132.
    As data iswritten to the data store we keep our „Connected Caches‟ up to date Processing Layer Dimension Caches (Replicated) Transactions Data Layer As new Facts are added Mtms relevant Dimensions that they reference are moved Cashflows to processing layer caches Fact Storage (Partitioned)
  • 133.
    The Replicated Layeris updated by recursing through the arcs on the domain model when facts change
  • 134.
    Saving a tradecauses all it‟s 1 levelst references to be triggered Query Layer Save Trade (With connected dimension Caches) Data Layer Cache Trad (All Normalised) Store e Partitioned Trigger Cache Party Sourc Ccy Alias e Book
  • 135.
    This updates theconnected caches Query Layer (With connected dimension Caches) Data Layer Trad (All Normalised) e Party Sourc Ccy Alias e Book
  • 136.
    The process recursesthrough the object graph Query Layer (With connected dimension Caches) Data Layer Trad (All Normalised) e Party Sourc Ccy Alias e Book Party Ledge rBook
  • 137.
    ‘Connected Replication’ A simple pattern which recurses through the foreign keys in the domain model, ensuring only ‘Connected’ dimensions are replicated
  • 138.
    With ‘Connected Replication’ only 1/10th of the data needs to be replicated (on average).
  • 139.
  • 140.
  • 141.
  • 142.
  • 143.
  • 144.
  • 145.
    Conclusion Partitioned Storage
  • 146.
  • 147.

Editor's Notes

  • #14 I started a project back in 2004. It was a trading system back at barcap. When it came to persisting our data there were three choices, Oracle, Sybase or Sql Server. A lot of has changed in that time. Today, we are far more likely to look at one of a variety of technologies to satisfy our need to store and re-retrieve our data. So how many of you use a traditional database?What about a distributed database like Oracle RAC?NoSQL?.. do you use it with a database or stand alone.What about an in memory database? in production?Finally what about distributed in memory?This talk is about an in memory database. It&apos;s not really a distributed cache, despite being implemented in Coherence, although you could call it one if you preferred. In truth it has a variety of elements that make it closer to what you might perceive to be a database. It is normalised: that is to say that it holds entities independently from one another and versions them as such. It has some basic guarantees of the automaticity when writing certain groups of objects that are collocated. Most importantly it is both fast and scalable regardless of the join criteria you impose on it, this being something fairly illusive in the world of distributed data storage. I have a few aims for today:I hope you will leave with a broader view on what stores are available to you and what is coming in the future.I hope you&apos;ll see the benefits that niece storage solutions can provide through simpler contracts between client and data store.I&apos;d like you to understand the benefits of memory over disk.
  • #88 Better example is amazonPartition by user so orders and basket are held togetherProducts will be shared by multiple users
  • #89 Big data sets are held distributed and only joined on the grid to collocated objects.Small data sets are held in replicated caches so they can be joined in process (only ‘active’ data is held)
  • #99 Big data sets are held distributed and only joined on the grid to collocated objects.Small data sets are held in replicated caches so they can be joined in process (only ‘active’ data is held)