SlideShare a Scribd company logo
Scaling Your Cache
& Caching at Scale


          Alex Miller
          @puredanger
Mission
• Why does caching work?
• What’s hard about caching?
• How do we make choices as we
  design a caching architecture?
• How do we test a cache for
  performance?
What is caching?
Lots of data
Memory Hierarchy
                                               Clock cycles to access

   Register   1


   L1 cache   3


   L2 cache   15


      RAM     200


       Disk   10000000


Remote disk   1000000000


          1E+00 1E+01 1E+02 1E+03 1E+04 1E+05 1E+06 1E+07 1E+08 1E+09
Facts of Life
  Register          Fast Small Expensive
 L1 Cache
 L2 Cache
Main Memory
 Local Disk
Remote Disk         Slow   Big   Cheap
Caching to the rescue!
Temporal Locality
Hits:                         0%
Cache:

Stream:
Temporal Locality
Hits:                         0%
Cache:

Stream:

Stream:

Cache:
Hits:                         65%
Non-uniform distribution
         Web page hits, ordered by rank
  3200                                      100%




  2400                                      75%




  1600                                      50%




   800                                      25%




    0                                        0%
              Page views, ordered by rank


               Pageviews per rank
               % of total hits per rank
Temporal locality
        +
  Non-uniform
  distribution
17000 pageviews
  assume avg load = 250 ms

cache 17 pages / 80% of views
   cached page load = 10 ms
    new avg load = 58 ms

    trade memory for
    latency reduction
The hidden benefit:
 reduces database load


Memory    Database

                                 sio ning
                                i
                            rov
                        er p
                  o f ov
           line
A brief aside...


• What is Ehcache?
• What is Terracotta?
Ehcache Example

CacheManager manager = new CacheManager();
Ehcache cache = manager.getEhcache("employees");
cache.put(new Element(employee.getId(), employee));
Element element = cache.get(employee.getId());


   <cache   name="employees"
            maxElementsInMemory="1000"
            memoryStoreEvictionPolicy="LRU"
            eternal="false"
            timeToIdleSeconds="600"
            timeToLiveSeconds="3600"
            overflowToDisk="false"/>
Terracotta

App Node   App Node     App Node     App Node



           Terracotta   Terracotta
             Server       Server



App Node   App Node     App Node     App Node
But things are not
always so simple...
Pain of Large
   Data Sets
• How do I choose which
  elements stay in memory
  and which go to disk?
• How do I choose which
  elements to evict when I
  have too many?
• How do I balance cache size
  against other memory uses?
Eviction
When cache memory is full, what do I do?
• Delete - Evict elements
• Overflow to disk - Move to slower,
  bigger storage

• Delete local - But keep remote data
Eviction in Ehcache

Evict with “Least Recently Used” policy:
    <cache   name="employees"
             maxElementsInMemory="1000"
             memoryStoreEvictionPolicy="LRU"
             eternal="false"
             timeToIdleSeconds="600"
             timeToLiveSeconds="3600"
             overflowToDisk="false"/>
Spill to Disk in Ehcache
Spill to disk:
      <diskStore path="java.io.tmpdir"/>

      <cache   name="employees"
               maxElementsInMemory="1000"
               memoryStoreEvictionPolicy="LRU"
               eternal="false"
               timeToIdleSeconds="600"
               timeToLiveSeconds="3600"

               overflowToDisk="true"
               maxElementsOnDisk="1000000"
               diskExpiryThreadIntervalSeconds="120"
               diskSpoolBufferSizeMB="30" />
Terracotta Clustering
Terracotta configuration:
     <terracottaConfig url="server1:9510,server2:9510"/>

     <cache   name="employees"
              maxElementsInMemory="1000"
              memoryStoreEvictionPolicy="LRU"
              eternal="false"
              timeToIdleSeconds="600"
              timeToLiveSeconds="3600"
              overflowToDisk="false">

         <terracotta/>
     </cache>
Pain of Stale Data
• How tolerant am I of seeing
  values changed on the
  underlying data source?
• How tolerant am I of seeing
  values changed by another
  node?
Expiration

TTI=4


        0 1 2 3 4 5 6 7 8 9

TTL=4
TTI and TTL in Ehcache

 <cache   name="employees"
          maxElementsInMemory="1000"
          memoryStoreEvictionPolicy="LRU"
          eternal="false"
          timeToIdleSeconds="600"
          timeToLiveSeconds="3600"
          overflowToDisk="false"/>
Replication in Ehcache
<cacheManagerPeerProviderFactory
    class="net.sf.ehcache.distribution.
           RMICacheManagerPeerProviderFactory"
    properties="hostName=fully_qualified_hostname_or_ip,
                peerDiscovery=automatic,
                multicastGroupAddress=230.0.0.1,
                multicastGroupPort=4446, timeToLive=32"/>

<cache name="employees" ...>
    <cacheEventListenerFactory
class="net.sf.ehcache.distribution.RMICacheReplicatorFactory”
         properties="replicateAsynchronously=true,
         replicatePuts=true,
         replicatePutsViaCopy=false,
         replicateUpdates=true,
         replicateUpdatesViaCopy=true,
         replicateRemovals=true
         asynchronousReplicationIntervalMillis=1000"/>
</cache>
Terracotta Clustering
Still use TTI and TTL to manage stale data
between cache and data source

Coherent by default but can relax with
coherentReads=”false”
Pain of Loading
• How do I pre-load the cache on startup?
• How do I avoid re-loading the data on every
  node?
Persistent Disk Store
<diskStore path="java.io.tmpdir"/>

<cache   name="employees"
         maxElementsInMemory="1000"
         memoryStoreEvictionPolicy="LRU"
         eternal="false"
         timeToIdleSeconds="600"
         timeToLiveSeconds="3600"
         overflowToDisk="true"
         maxElementsOnDisk="1000000"
         diskExpiryThreadIntervalSeconds="120"
         diskSpoolBufferSizeMB="30"

         diskPersistent="true" />
Bootstrap Cache Loader

Bootstrap a new cache node from a peer:
       <bootstrapCacheLoaderFactory
             class="net.sf.ehcache.distribution.
                    RMIBootstrapCacheLoaderFactory"
             properties="bootstrapAsynchronously=true,
                         maximumChunkSizeBytes=5000000"
             propertySeparator=",” />




On startup, create background thread to pull
the existing cache data from another peer.
Terracotta Persistence
Nothing needed beyond setting up
Terracotta clustering.

Terracotta will automatically bootstrap:
- the cache key set on startup
- cache values on demand
Pain of Duplication
• How do I get failover capability while avoiding
  excessive duplication of data?
Partitioning + Terracotta
        Virtual Memory
•   Each node (mostly) holds data it has seen
•   Use load balancer to get app-level partitioning
•   Use fine-grained locking to get concurrency
•   Use memory flush/fault to handle memory
    overflow and availability
•   Use causal ordering to guarantee coherency
Scaling Your Cache
Scalability Continuum
 causal
ordering   YES NO                            NO                  YES                         YES
                                                                                  2 or more       2 or more
                                          2 or more JVMS                             JVMSmore
                                                                                     2 or
                                           2 or more big JVMs                                        2 or more
                                                                                                     JVMs
                     2 or more                                  2 or more              2 or more
                                                                                        JVMs           2 or more
                                                                                                        JVMs
                                                                                          JVMs of
                                                                                            lots
 # JVMs    1 JVM
                       2 or more
                        JVMS
                          JVMs
                                                                  2 or more
                                                                   JVMS
                                                                     JVMs                    JVMs
                                                                                                          JVMs of
                                                                                                            lots
                                                                                                             JVMs




                                                                  Terracotta
runtime    Ehcache
                       Ehcache
                         RMI
                                                Ehcache
                                               disk store            OSS            Terracotta FX         Terracotta FX
                                                                                        Ehcache FX           Ehcache FX




                            Ehcache DX                                         Ehcache EX and FX
                            management                                            management
                            and control                                           and control




                                                 more scale




                                                                                                     21
Caching at Scale
Know Your Use Case
• Is your data partitioned (sessions) or
  not (reference data)?
• Do you have a hot set or uniform
  access distribution?
• Do you have a very large data set?
• Do you have a high write rate (50%)?
• How much data consistency do you
  need?
Types of caches
Name             Communication      Advantage

Broadcast        multicast          low latency
invalidation
Replicated       multicast          offloads db


Datagrid         point-to-point     scalable


Distributed      2-tier point-to-   all of the above
                 point
Common Data Patterns
  I/O pattern     Locality         Hot set          Rate of
                                                    change
  Catalog/        low              low              low
  customer
  Inventory       high             high             high
  Conversations   high             high             low



Catalogs/customers       Inventory              Conversations
     • warm all the      • fine-grained          • sticky load
        data into            locking               balancer
        cache            • write-behind to DB   • disconnect
     • High TTL                                    conversations from
                                                   DB
Build a Test

• As realistic as possible
• Use real data (or good fake data)
• Verify test does what you think
• Ideal test run is 15-20 minutes
Cache Warming
Cache Warming

• Explicitly record cache warming or
  loading as a testing phase
• Possibly multiple warming phases
Lots o’ Knobs
Things to Change
• Cache size
• Read / write / other mix
• Key distribution
• Hot set
• Key / value size and structure
• # of nodes
Lots o’ Gauges
Things to Measure

• Application throughput (TPS)
• Application latency
• OS: CPU, Memory, Disk, Network
• JVM: Heap, Threads, GC
Benchmark and Tune

• Create a baseline
• Run and modify parameters
 • Test, observe, hypothesize, verify
• Keep a run log
Bottleneck Analysis
Pushing It
• If CPUs are not all busy...
 • Can you push more load?
 • Waiting for I/O or resources
• If CPUs are all busy...
 • Latency analysis
I/O Waiting
• Database
 • Connection pooling
 • Database tuning
 • Lazy connections
• Remote services
Locking and
                    Concurrency
Threads             Locks
                              Key     Value
                                  1



           ge
                                  2



               t2                 3


     get 2                        4

                                  5

                                  6



     put 8
                                  7

                                  8



               12
                                  9



          ut
                               10

      p                        11

                               12

                               13

                               14

                               15

                               16
Locking and
               Concurrency
Threads        Locks
                         Key     Value

      get 2                  1

                             2




      get 2
                             3

                             4

                             5

                             6


      put 8                  7

                             8

                             9

                          10


      put 12              11

                          12

                          13

                          14

                          15

                          16
Objects and GC
• Unnecessary object churn
• Tune GC
 • Concurrent vs parallel collectors
 • Max heap
 • ...and so much more
• Watch your GC pauses!!!
Cache Efficiency
• Watch hit rates and latencies
 • Cache hit - should be fast
   • Unless concurrency issue
 • Cache miss
   • Miss local vs
   • Miss disk / cluster
Cache Sizing
• Expiration and eviction tuning
 • TTI - manage moving hot set
 • TTL - manage max staleness
 • Max in memory - keep hot set
   resident
 • Max on disk / cluster - manage total
   disk / clustered cache
Cache Coherency

• No replication (fastest)
• RMI replication (loose coupling)
• Terracotta replication (causal
  ordering) - way faster than strict
  ordering
Latency Analysis

• Profilers
• Custom timers
• Tier timings
• Tracer bullets
mumble-mumble*

 It’s time to add it to Terracotta.




                    * lawyers won’t let me say more
Thanks!

• Twitter - @puredanger
• Blog - http://tech.puredanger.com
• Terracotta - http://terracotta.org
• Ehcache - http://ehcache.org

More Related Content

What's hot (19)

ODP
First Day With J Ruby
Praveen Kumar Sinha
 
PPT
0628阙宏宇
zhu02
 
PDF
My Old Friend Malloc
Christoph Engelbert
 
PDF
Optimizing WordPress for Performance - WordCamp Houston
Chris Olbekson
 
PPTX
캐시 분산처리 인프라
Park Chunduck
 
PPTX
Memcached
Shrawan Kumar Nirala
 
PPT
ESX performance problems 10 steps
Concentrated Technology
 
PPTX
Bangalore cloudstack user group
ShapeBlue
 
PPTX
cache concepts and varnish-cache
Marc Cortinas Val
 
PDF
Wckansai 2014
Wataru OKAMOTO
 
PDF
Varnish Cache
Mike Willbanks
 
PDF
Dutch php conference_apc_mem2010
isnull
 
PPTX
Spring JMS and ActiveMQ
Geert Pante
 
PDF
Memcached
elliando dias
 
PDF
Plugin Memcached%20 Study
Liu Lizhi
 
PDF
Caching with Varnish
schoefmax
 
PDF
vSphere APIs for performance monitoring
Alan Renouf
 
PDF
Hybrid Cloud PHPUK2012
Combell NV
 
PDF
Intro to riak
Jaseem Abid
 
First Day With J Ruby
Praveen Kumar Sinha
 
0628阙宏宇
zhu02
 
My Old Friend Malloc
Christoph Engelbert
 
Optimizing WordPress for Performance - WordCamp Houston
Chris Olbekson
 
캐시 분산처리 인프라
Park Chunduck
 
ESX performance problems 10 steps
Concentrated Technology
 
Bangalore cloudstack user group
ShapeBlue
 
cache concepts and varnish-cache
Marc Cortinas Val
 
Wckansai 2014
Wataru OKAMOTO
 
Varnish Cache
Mike Willbanks
 
Dutch php conference_apc_mem2010
isnull
 
Spring JMS and ActiveMQ
Geert Pante
 
Memcached
elliando dias
 
Plugin Memcached%20 Study
Liu Lizhi
 
Caching with Varnish
schoefmax
 
vSphere APIs for performance monitoring
Alan Renouf
 
Hybrid Cloud PHPUK2012
Combell NV
 
Intro to riak
Jaseem Abid
 

Similar to Scaling Your Cache And Caching At Scale (20)

PDF
Scaling Your Cache
Alex Miller
 
PPTX
Jug Lugano - Scale over the limits
Davide Carnevali
 
PPTX
Java performance tuning
Mohammed Fazuluddin
 
PDF
Cold Hard Cache
Alex Miller
 
PPT
Virtualization Manager 5.0 – Now with Hyper-V Support!
SolarWinds
 
PPTX
Usenix LISA 2012 - Choosing a Proxy
Leif Hedstrom
 
PDF
Secrets of Performance Tuning Java on Kubernetes
Bruno Borges
 
PDF
Building low latency java applications with ehcache
Chris Westin
 
PDF
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
srisatish ambati
 
PPTX
EVCache at Netflix
Shashi Shekar Madappa
 
PDF
Lightweight Grids With Terracotta
PT.JUG
 
PPTX
Planning to Fail #phpne13
Dave Gardner
 
PDF
ContainerWorkloadwithSemeru.pdf
SumanMitra22
 
PPTX
CloudStack Performance Testing
buildacloud
 
PDF
java-monitoring-troubleshooting
William Au
 
PPT
Hibernate Cache
Yenwen Feng
 
PDF
TorqueBox at DC:JBUG - November 2011
bobmcwhirter
 
PDF
Building Asynchronous Applications
Johan Edstrom
 
PDF
WordPress Performance & Scalability
Joseph Scott
 
PDF
[Outdated] Secrets of Performance Tuning Java on Kubernetes
Bruno Borges
 
Scaling Your Cache
Alex Miller
 
Jug Lugano - Scale over the limits
Davide Carnevali
 
Java performance tuning
Mohammed Fazuluddin
 
Cold Hard Cache
Alex Miller
 
Virtualization Manager 5.0 – Now with Hyper-V Support!
SolarWinds
 
Usenix LISA 2012 - Choosing a Proxy
Leif Hedstrom
 
Secrets of Performance Tuning Java on Kubernetes
Bruno Borges
 
Building low latency java applications with ehcache
Chris Westin
 
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
srisatish ambati
 
EVCache at Netflix
Shashi Shekar Madappa
 
Lightweight Grids With Terracotta
PT.JUG
 
Planning to Fail #phpne13
Dave Gardner
 
ContainerWorkloadwithSemeru.pdf
SumanMitra22
 
CloudStack Performance Testing
buildacloud
 
java-monitoring-troubleshooting
William Au
 
Hibernate Cache
Yenwen Feng
 
TorqueBox at DC:JBUG - November 2011
bobmcwhirter
 
Building Asynchronous Applications
Johan Edstrom
 
WordPress Performance & Scalability
Joseph Scott
 
[Outdated] Secrets of Performance Tuning Java on Kubernetes
Bruno Borges
 
Ad

More from Alex Miller (20)

PDF
Clojure/West Overview (12/1/11)
Alex Miller
 
PDF
Concurrent Stream Processing
Alex Miller
 
PDF
Stream Execution with Clojure and Fork/join
Alex Miller
 
PDF
Cracking clojure
Alex Miller
 
PDF
Releasing Relational Data to the Semantic Web
Alex Miller
 
PDF
Clojure: The Art of Abstraction
Alex Miller
 
PDF
Tree Editing with Zippers
Alex Miller
 
PDF
Groovy concurrency
Alex Miller
 
PDF
Java Concurrency Gotchas
Alex Miller
 
PDF
Blogging ZOMG
Alex Miller
 
PDF
Innovative Software
Alex Miller
 
PDF
Caching In The Cloud
Alex Miller
 
PDF
Marshmallow Test
Alex Miller
 
PDF
Strange Loop Conference 2009
Alex Miller
 
PDF
Scaling Hibernate with Terracotta
Alex Miller
 
PDF
Project Fortress
Alex Miller
 
PDF
Java Collections API
Alex Miller
 
PDF
Java Concurrency Idioms
Alex Miller
 
PDF
Design Patterns Reconsidered
Alex Miller
 
PDF
Java 7 Preview
Alex Miller
 
Clojure/West Overview (12/1/11)
Alex Miller
 
Concurrent Stream Processing
Alex Miller
 
Stream Execution with Clojure and Fork/join
Alex Miller
 
Cracking clojure
Alex Miller
 
Releasing Relational Data to the Semantic Web
Alex Miller
 
Clojure: The Art of Abstraction
Alex Miller
 
Tree Editing with Zippers
Alex Miller
 
Groovy concurrency
Alex Miller
 
Java Concurrency Gotchas
Alex Miller
 
Blogging ZOMG
Alex Miller
 
Innovative Software
Alex Miller
 
Caching In The Cloud
Alex Miller
 
Marshmallow Test
Alex Miller
 
Strange Loop Conference 2009
Alex Miller
 
Scaling Hibernate with Terracotta
Alex Miller
 
Project Fortress
Alex Miller
 
Java Collections API
Alex Miller
 
Java Concurrency Idioms
Alex Miller
 
Design Patterns Reconsidered
Alex Miller
 
Java 7 Preview
Alex Miller
 
Ad

Recently uploaded (20)

PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
July Patch Tuesday
Ivanti
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
July Patch Tuesday
Ivanti
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 

Scaling Your Cache And Caching At Scale

  • 1. Scaling Your Cache & Caching at Scale Alex Miller @puredanger
  • 2. Mission • Why does caching work? • What’s hard about caching? • How do we make choices as we design a caching architecture? • How do we test a cache for performance?
  • 5. Memory Hierarchy Clock cycles to access Register 1 L1 cache 3 L2 cache 15 RAM 200 Disk 10000000 Remote disk 1000000000 1E+00 1E+01 1E+02 1E+03 1E+04 1E+05 1E+06 1E+07 1E+08 1E+09
  • 6. Facts of Life Register Fast Small Expensive L1 Cache L2 Cache Main Memory Local Disk Remote Disk Slow Big Cheap
  • 7. Caching to the rescue!
  • 8. Temporal Locality Hits: 0% Cache: Stream:
  • 9. Temporal Locality Hits: 0% Cache: Stream: Stream: Cache: Hits: 65%
  • 10. Non-uniform distribution Web page hits, ordered by rank 3200 100% 2400 75% 1600 50% 800 25% 0 0% Page views, ordered by rank Pageviews per rank % of total hits per rank
  • 11. Temporal locality + Non-uniform distribution
  • 12. 17000 pageviews assume avg load = 250 ms cache 17 pages / 80% of views cached page load = 10 ms new avg load = 58 ms trade memory for latency reduction
  • 13. The hidden benefit: reduces database load Memory Database sio ning i rov er p o f ov line
  • 14. A brief aside... • What is Ehcache? • What is Terracotta?
  • 15. Ehcache Example CacheManager manager = new CacheManager(); Ehcache cache = manager.getEhcache("employees"); cache.put(new Element(employee.getId(), employee)); Element element = cache.get(employee.getId()); <cache name="employees" maxElementsInMemory="1000" memoryStoreEvictionPolicy="LRU" eternal="false" timeToIdleSeconds="600" timeToLiveSeconds="3600" overflowToDisk="false"/>
  • 16. Terracotta App Node App Node App Node App Node Terracotta Terracotta Server Server App Node App Node App Node App Node
  • 17. But things are not always so simple...
  • 18. Pain of Large Data Sets • How do I choose which elements stay in memory and which go to disk? • How do I choose which elements to evict when I have too many? • How do I balance cache size against other memory uses?
  • 19. Eviction When cache memory is full, what do I do? • Delete - Evict elements • Overflow to disk - Move to slower, bigger storage • Delete local - But keep remote data
  • 20. Eviction in Ehcache Evict with “Least Recently Used” policy: <cache name="employees" maxElementsInMemory="1000" memoryStoreEvictionPolicy="LRU" eternal="false" timeToIdleSeconds="600" timeToLiveSeconds="3600" overflowToDisk="false"/>
  • 21. Spill to Disk in Ehcache Spill to disk: <diskStore path="java.io.tmpdir"/> <cache name="employees" maxElementsInMemory="1000" memoryStoreEvictionPolicy="LRU" eternal="false" timeToIdleSeconds="600" timeToLiveSeconds="3600" overflowToDisk="true" maxElementsOnDisk="1000000" diskExpiryThreadIntervalSeconds="120" diskSpoolBufferSizeMB="30" />
  • 22. Terracotta Clustering Terracotta configuration: <terracottaConfig url="server1:9510,server2:9510"/> <cache name="employees" maxElementsInMemory="1000" memoryStoreEvictionPolicy="LRU" eternal="false" timeToIdleSeconds="600" timeToLiveSeconds="3600" overflowToDisk="false"> <terracotta/> </cache>
  • 23. Pain of Stale Data • How tolerant am I of seeing values changed on the underlying data source? • How tolerant am I of seeing values changed by another node?
  • 24. Expiration TTI=4 0 1 2 3 4 5 6 7 8 9 TTL=4
  • 25. TTI and TTL in Ehcache <cache name="employees" maxElementsInMemory="1000" memoryStoreEvictionPolicy="LRU" eternal="false" timeToIdleSeconds="600" timeToLiveSeconds="3600" overflowToDisk="false"/>
  • 26. Replication in Ehcache <cacheManagerPeerProviderFactory class="net.sf.ehcache.distribution. RMICacheManagerPeerProviderFactory" properties="hostName=fully_qualified_hostname_or_ip, peerDiscovery=automatic, multicastGroupAddress=230.0.0.1, multicastGroupPort=4446, timeToLive=32"/> <cache name="employees" ...> <cacheEventListenerFactory class="net.sf.ehcache.distribution.RMICacheReplicatorFactory” properties="replicateAsynchronously=true, replicatePuts=true, replicatePutsViaCopy=false, replicateUpdates=true, replicateUpdatesViaCopy=true, replicateRemovals=true asynchronousReplicationIntervalMillis=1000"/> </cache>
  • 27. Terracotta Clustering Still use TTI and TTL to manage stale data between cache and data source Coherent by default but can relax with coherentReads=”false”
  • 28. Pain of Loading • How do I pre-load the cache on startup? • How do I avoid re-loading the data on every node?
  • 29. Persistent Disk Store <diskStore path="java.io.tmpdir"/> <cache name="employees" maxElementsInMemory="1000" memoryStoreEvictionPolicy="LRU" eternal="false" timeToIdleSeconds="600" timeToLiveSeconds="3600" overflowToDisk="true" maxElementsOnDisk="1000000" diskExpiryThreadIntervalSeconds="120" diskSpoolBufferSizeMB="30" diskPersistent="true" />
  • 30. Bootstrap Cache Loader Bootstrap a new cache node from a peer: <bootstrapCacheLoaderFactory class="net.sf.ehcache.distribution. RMIBootstrapCacheLoaderFactory" properties="bootstrapAsynchronously=true, maximumChunkSizeBytes=5000000" propertySeparator=",” /> On startup, create background thread to pull the existing cache data from another peer.
  • 31. Terracotta Persistence Nothing needed beyond setting up Terracotta clustering. Terracotta will automatically bootstrap: - the cache key set on startup - cache values on demand
  • 32. Pain of Duplication • How do I get failover capability while avoiding excessive duplication of data?
  • 33. Partitioning + Terracotta Virtual Memory • Each node (mostly) holds data it has seen • Use load balancer to get app-level partitioning • Use fine-grained locking to get concurrency • Use memory flush/fault to handle memory overflow and availability • Use causal ordering to guarantee coherency
  • 35. Scalability Continuum causal ordering YES NO NO YES YES 2 or more 2 or more 2 or more JVMS JVMSmore 2 or 2 or more big JVMs 2 or more JVMs 2 or more 2 or more 2 or more JVMs 2 or more JVMs JVMs of lots # JVMs 1 JVM 2 or more JVMS JVMs 2 or more JVMS JVMs JVMs JVMs of lots JVMs Terracotta runtime Ehcache Ehcache RMI Ehcache disk store OSS Terracotta FX Terracotta FX Ehcache FX Ehcache FX Ehcache DX Ehcache EX and FX management management and control and control more scale 21
  • 37. Know Your Use Case • Is your data partitioned (sessions) or not (reference data)? • Do you have a hot set or uniform access distribution? • Do you have a very large data set? • Do you have a high write rate (50%)? • How much data consistency do you need?
  • 38. Types of caches Name Communication Advantage Broadcast multicast low latency invalidation Replicated multicast offloads db Datagrid point-to-point scalable Distributed 2-tier point-to- all of the above point
  • 39. Common Data Patterns I/O pattern Locality Hot set Rate of change Catalog/ low low low customer Inventory high high high Conversations high high low Catalogs/customers Inventory Conversations • warm all the • fine-grained • sticky load data into locking balancer cache • write-behind to DB • disconnect • High TTL conversations from DB
  • 40. Build a Test • As realistic as possible • Use real data (or good fake data) • Verify test does what you think • Ideal test run is 15-20 minutes
  • 42. Cache Warming • Explicitly record cache warming or loading as a testing phase • Possibly multiple warming phases
  • 44. Things to Change • Cache size • Read / write / other mix • Key distribution • Hot set • Key / value size and structure • # of nodes
  • 46. Things to Measure • Application throughput (TPS) • Application latency • OS: CPU, Memory, Disk, Network • JVM: Heap, Threads, GC
  • 47. Benchmark and Tune • Create a baseline • Run and modify parameters • Test, observe, hypothesize, verify • Keep a run log
  • 49. Pushing It • If CPUs are not all busy... • Can you push more load? • Waiting for I/O or resources • If CPUs are all busy... • Latency analysis
  • 50. I/O Waiting • Database • Connection pooling • Database tuning • Lazy connections • Remote services
  • 51. Locking and Concurrency Threads Locks Key Value 1 ge 2 t2 3 get 2 4 5 6 put 8 7 8 12 9 ut 10 p 11 12 13 14 15 16
  • 52. Locking and Concurrency Threads Locks Key Value get 2 1 2 get 2 3 4 5 6 put 8 7 8 9 10 put 12 11 12 13 14 15 16
  • 53. Objects and GC • Unnecessary object churn • Tune GC • Concurrent vs parallel collectors • Max heap • ...and so much more • Watch your GC pauses!!!
  • 54. Cache Efficiency • Watch hit rates and latencies • Cache hit - should be fast • Unless concurrency issue • Cache miss • Miss local vs • Miss disk / cluster
  • 55. Cache Sizing • Expiration and eviction tuning • TTI - manage moving hot set • TTL - manage max staleness • Max in memory - keep hot set resident • Max on disk / cluster - manage total disk / clustered cache
  • 56. Cache Coherency • No replication (fastest) • RMI replication (loose coupling) • Terracotta replication (causal ordering) - way faster than strict ordering
  • 57. Latency Analysis • Profilers • Custom timers • Tier timings • Tracer bullets
  • 58. mumble-mumble* It’s time to add it to Terracotta. * lawyers won’t let me say more
  • 59. Thanks! • Twitter - @puredanger • Blog - http://tech.puredanger.com • Terracotta - http://terracotta.org • Ehcache - http://ehcache.org