SlideShare a Scribd company logo
Geode is Not a Cache,
it's an Analytics Engine!
By Evan Benoit (evan.benoit@resonate.com)
and Sharif Ghazzawi (sharif.ghazzawi@resonate.com)
U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
Who is Resonate?
Marketing and Advertising Technology Company
Located in Reston, VA
Give our clients insights into their customers’ values and motivations
Hiring Spring and Big Data Engineers!
http://www.resonate.com
2
U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
What Kind of Data Do We Have?
We model predictions for thousands of different attributes
• Likes, Dislikes, Motivations, Behavior, Sentiments
3
1000’s of attributes
200 million
cookies
1.7 trillion total
predictions!
1000’s of sites
21.9 billion total
site hits
U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
What Do We Do With the Data?
Our SaaS platform computes thousands of insights for our clients’ sites
• Example: “How many cookies hit my homepage yesterday that we’ve modeled
as female democrats, and how does that compare to the general population?”
4
Women Dems
Home
page
Women Dems
U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ 5
Each Insights Report requires
thousands of set operations
to be performed ad hoc,
within seconds!
U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
Key Take-aways
Geode can be used as more than a simple Key-Value cache; it can run functions on
data in-memory.
Probabilistic Data Structures can be used in many industries to perform set
operations at scale.
A Spring/Geode architecture can improve performance and scalability.
6
U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
What Didn’t Work? HBase
Brute force approach: HBase co-processors sequentially scanning bitmaps
Completely inappropriate use of HBase!
40-node cluster, 30 second queries
Essentially using HBase as an in-memory database
7
1000’s of attributes
200 million
cookies
1000’s of sites
Sequential scan
Sequential scan
Sequential scan
U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
Probabilistic Data Structures for Estimating
Cardinality of a Set
We have a counting problem. You probably do, too.
Our users don’t require exact precision. We’re not a bank!
Probabilistic data structures can estimate the cardinality of a set
Data uses in fixed amount of Time and Space
8
U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
Yahoo Theta Sketch
Yahoo’s Theta Sketches give you estimated counts in a fixed amount of space…
… and they also support set operations!!
9
Example from https://datasketches.github.io/docs/Theta/ThetaJavaExample.html
U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
Sketches Begin Multiplying Like Rabbits
Sketches can’t contain any additional metadata
We need a sketch for each attribute, for each tag
Next thing we know, we have 150 Million sketches, 2 Terabytes total
We need a place to store all these sketches
11
Example from https://datasketches.github.io/docs/Theta/ThetaJavaExample.html
1000’s of attributes
1000’s of
sites
U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
We Need a Distributed In-memory Database…
U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
System Architecture
U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
System Characteristics
Data Locality
• We register Java methods built with the Theta Sketch library into Geode
• These set operations run close to the data. No need to shuffle data between
nodes. The sketches never leave Geode; Geode just returns the final count.
Performance
• Computing the cardinality of a set is now an O(1) lookup instead of O(n) full
table scan
• Output of a set operation is a sketch rather than a number, allowing multiple
set operations to be chained together efficiently
14
U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
System Characteristics
Fault Tolerance/Resiliency
• Geode Locators and Servers can be added/removed with zero downtime
• AWS Elastic Load Balancer (ELB) detects when a Geode ECS node is
unhealthy, kills the Docker container, spawns a new one
• Nodes are distributed across multiple AWS Availability Zones
Scalability
• Just add more servers and rebalance
15
U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
Geode Regions
Geode gives a lot of options for how to persist and replicate your data
Original design called for persistent, replicated, partitioned Geode regions
• But persistence and replication made it difficult to swap out bad Geode nodes
• It checks filesystem to ensure that no data was lost – Slow!
• Data is shuffled to honor the replication config – Slow!
Solution: We use AWS S3 as our persistent, replicated layer, not Geode
• Geode reads-through from S3 whenever it doesn’t have the data
• We read-through "parcels" containing thousands of sketches instead of
individually one at-a-time
16
U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
Geode and Docker
Geode doesn’t easily support Docker/ECS
• Recommended way of starting locators and servers is via Gfsh
• Gfsh starts locator/server in the background then exits
• Docker container exits/dies once there is no process running in the foreground
Solution: We added a dummy foreground process to keep Docker container up
17
U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
Geode and ECS
Geode Locators keep state on local disk, which is transient in AWS ECS
• Don't assume existence of a local disk
• Makes it difficult to honor "12 factor app" principles
Solution: We deploy and associate Locator docker instances to EC2 nodes with
storage
18
U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
Geode and Spring Boot
Spring-data-geode didn’t fit our production architecture
• Initially we tried embedding Geode in Spring Boot
• No lifecycle hooks for Spring apps to tap into for heath checks
• Makes designing fault tolerance/resiliency and scalability difficult
Solution: We run Geode as a standalone process, not embedded in spring
19
U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
Geode and Configuration Data
Improved configuration management flexibility
• Geode comes with a tightly integrated configuration management sub-system
• Configs are uploaded to locators, distributed to servers
Many organizations already have a configuration management system
• e.g. consul, zookeeper, spring-cloud-config
We’d love to see Geode’s configuration system be pluggable/swappable
20
U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
Testing Distributed Systems
As with any distributed system, make sure you understand the consistency,
availability and partition-tolerance guarantees provided by your tools, and
ultimately your system
• Identify what parts of your system will provide redundancy
• How does your system respond to various failure scenarios?
• Test, Test, Test those scenarios
21
U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
Summary
We Deployed a Spring and Geode Architecture
Containing Yahoo Theta Sketches
Significantly improved our main report’s performance
Reduced operating costs by 95% over our previous HBase implementation
Increased scalability
Simplified operations
Increased resiliency
22
Resonate is HIRING in RESTON!
Spring Engineers
Big Data Engineers (Spark, Geode, Hadoop, Kafka)
Dev Ops Engineers (AWS)
UX Engineers (Ember.js)
https://www.resonate.com/about/careers/
#springone@s1p

More Related Content

What's hot (20)

PPTX
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
DataWorks Summit
 
PDF
Big Data
Ben Duan
 
PPTX
Big data and its impact on SOA
Demed L'Her
 
PDF
Cloudwatt pioneers big_data
xband
 
PPTX
Designing Data Pipelines for Automous and Trusted Analytics
DataWorks Summit
 
PPT
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
Cloudera, Inc.
 
PPTX
Building intelligent applications, experimental ML with Uber’s Data Science W...
DataWorks Summit
 
PPTX
Data Science: Driving Smarter Finance and Workforce Decsions for the Enterprise
DataWorks Summit
 
PPTX
Hadoop in the cloud – The what, why and how from the experts
DataWorks Summit
 
PPTX
Impala Unlocks Interactive BI on Hadoop
Cloudera, Inc.
 
PPTX
C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting a...
DataStax Academy
 
PDF
Complex Analytics using Open Source Technologies
DataWorks Summit
 
PPTX
The convergence of reporting and interactive BI on Hadoop
DataWorks Summit
 
PPTX
How to use flash drives with Apache Hadoop 3.x: Real world use cases and proo...
DataWorks Summit
 
PPTX
Real World Use Case with Cassandra (Eddie Satterly, DataNexus) | C* Summit 2016
DataStax
 
PDF
Evolving Hadoop into an Operational Platform with Data Applications
DataWorks Summit
 
PDF
Using hadoop to expand data warehousing
DataWorks Summit
 
PPTX
Harnessing Hadoop Distuption: A Telco Case Study
DataWorks Summit
 
PPTX
Realizing the Promise of Big Data with Hadoop - Cloudera Summer Webinar Serie...
Cloudera, Inc.
 
PPTX
AWS & Intel Webinar Series - Accelerating AI Research
Intel® Software
 
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
DataWorks Summit
 
Big Data
Ben Duan
 
Big data and its impact on SOA
Demed L'Her
 
Cloudwatt pioneers big_data
xband
 
Designing Data Pipelines for Automous and Trusted Analytics
DataWorks Summit
 
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
Cloudera, Inc.
 
Building intelligent applications, experimental ML with Uber’s Data Science W...
DataWorks Summit
 
Data Science: Driving Smarter Finance and Workforce Decsions for the Enterprise
DataWorks Summit
 
Hadoop in the cloud – The what, why and how from the experts
DataWorks Summit
 
Impala Unlocks Interactive BI on Hadoop
Cloudera, Inc.
 
C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting a...
DataStax Academy
 
Complex Analytics using Open Source Technologies
DataWorks Summit
 
The convergence of reporting and interactive BI on Hadoop
DataWorks Summit
 
How to use flash drives with Apache Hadoop 3.x: Real world use cases and proo...
DataWorks Summit
 
Real World Use Case with Cassandra (Eddie Satterly, DataNexus) | C* Summit 2016
DataStax
 
Evolving Hadoop into an Operational Platform with Data Applications
DataWorks Summit
 
Using hadoop to expand data warehousing
DataWorks Summit
 
Harnessing Hadoop Distuption: A Telco Case Study
DataWorks Summit
 
Realizing the Promise of Big Data with Hadoop - Cloudera Summer Webinar Serie...
Cloudera, Inc.
 
AWS & Intel Webinar Series - Accelerating AI Research
Intel® Software
 

Similar to Geode is Not a Cache, it's an Analytics Engine (20)

PPTX
Building Highly Scalable Spring Applications using In-Memory Data Grids
John Blum
 
PPTX
Introducing Apache Geode and Spring Data GemFire
John Blum
 
PPTX
Geode Meetup Apachecon
upthewaterspout
 
PDF
Apache Geode Meetup, London
Apache Geode
 
PDF
Apache Geode Meetup, Cork, Ireland at CIT
Apache Geode
 
PDF
Pivotal's effort on Apache Geode
Apache Apex
 
PPTX
ApexMeetup Geode - Talk1 2016-03-17
Apache Apex Organizer
 
PDF
Introduction to Apache Geode (Cork, Ireland)
Anthony Baker
 
PPTX
Getting Started with Apache Geode
John Blum
 
POTX
Building Effective Apache Geode Applications with Spring Data GemFire
John Blum
 
PPTX
Building Data Environments for Production Microservices with Geode
VMware Tanzu
 
PPTX
An Introduction to Apache Geode (incubating)
Anthony Baker
 
PPTX
Open Sourcing GemFire - Apache Geode
Apache Geode
 
PDF
Implementing a highly scalable stock prediction system with R, Geode, SpringX...
William Markito Oliveira
 
PPTX
Apache Geode (incubating) Introduction with Docker
William Markito Oliveira
 
PDF
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
In-Memory Computing Summit
 
PPTX
Building a Stock Prediction system with Machine Learning using Geode, SpringX...
William Markito Oliveira
 
PDF
Build your first Internet of Things app today with Open Source
Apache Geode
 
PPTX
LocationTech Projects
Jody Garnett
 
PDF
Visualize and Analyze Apache Geode Real-time and Historical Metrics
VMware Tanzu
 
Building Highly Scalable Spring Applications using In-Memory Data Grids
John Blum
 
Introducing Apache Geode and Spring Data GemFire
John Blum
 
Geode Meetup Apachecon
upthewaterspout
 
Apache Geode Meetup, London
Apache Geode
 
Apache Geode Meetup, Cork, Ireland at CIT
Apache Geode
 
Pivotal's effort on Apache Geode
Apache Apex
 
ApexMeetup Geode - Talk1 2016-03-17
Apache Apex Organizer
 
Introduction to Apache Geode (Cork, Ireland)
Anthony Baker
 
Getting Started with Apache Geode
John Blum
 
Building Effective Apache Geode Applications with Spring Data GemFire
John Blum
 
Building Data Environments for Production Microservices with Geode
VMware Tanzu
 
An Introduction to Apache Geode (incubating)
Anthony Baker
 
Open Sourcing GemFire - Apache Geode
Apache Geode
 
Implementing a highly scalable stock prediction system with R, Geode, SpringX...
William Markito Oliveira
 
Apache Geode (incubating) Introduction with Docker
William Markito Oliveira
 
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
In-Memory Computing Summit
 
Building a Stock Prediction system with Machine Learning using Geode, SpringX...
William Markito Oliveira
 
Build your first Internet of Things app today with Open Source
Apache Geode
 
LocationTech Projects
Jody Garnett
 
Visualize and Analyze Apache Geode Real-time and Historical Metrics
VMware Tanzu
 
Ad

More from VMware Tanzu (20)

PDF
Spring into AI presented by Dan Vega 5/14
VMware Tanzu
 
PDF
What AI Means For Your Product Strategy And What To Do About It
VMware Tanzu
 
PDF
Make the Right Thing the Obvious Thing at Cardinal Health 2023
VMware Tanzu
 
PPTX
Enhancing DevEx and Simplifying Operations at Scale
VMware Tanzu
 
PDF
Spring Update | July 2023
VMware Tanzu
 
PPTX
Platforms, Platform Engineering, & Platform as a Product
VMware Tanzu
 
PPTX
Building Cloud Ready Apps
VMware Tanzu
 
PDF
Spring Boot 3 And Beyond
VMware Tanzu
 
PDF
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
VMware Tanzu
 
PDF
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
VMware Tanzu
 
PDF
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
VMware Tanzu
 
PPTX
tanzu_developer_connect.pptx
VMware Tanzu
 
PDF
Tanzu Virtual Developer Connect Workshop - French
VMware Tanzu
 
PDF
Tanzu Developer Connect Workshop - English
VMware Tanzu
 
PDF
Virtual Developer Connect Workshop - English
VMware Tanzu
 
PDF
Tanzu Developer Connect - French
VMware Tanzu
 
PDF
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
VMware Tanzu
 
PDF
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
VMware Tanzu
 
PDF
SpringOne Tour: The Influential Software Engineer
VMware Tanzu
 
PDF
SpringOne Tour: Domain-Driven Design: Theory vs Practice
VMware Tanzu
 
Spring into AI presented by Dan Vega 5/14
VMware Tanzu
 
What AI Means For Your Product Strategy And What To Do About It
VMware Tanzu
 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
VMware Tanzu
 
Enhancing DevEx and Simplifying Operations at Scale
VMware Tanzu
 
Spring Update | July 2023
VMware Tanzu
 
Platforms, Platform Engineering, & Platform as a Product
VMware Tanzu
 
Building Cloud Ready Apps
VMware Tanzu
 
Spring Boot 3 And Beyond
VMware Tanzu
 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
VMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
VMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
VMware Tanzu
 
tanzu_developer_connect.pptx
VMware Tanzu
 
Tanzu Virtual Developer Connect Workshop - French
VMware Tanzu
 
Tanzu Developer Connect Workshop - English
VMware Tanzu
 
Virtual Developer Connect Workshop - English
VMware Tanzu
 
Tanzu Developer Connect - French
VMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
VMware Tanzu
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
VMware Tanzu
 
SpringOne Tour: The Influential Software Engineer
VMware Tanzu
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
VMware Tanzu
 
Ad

Recently uploaded (20)

PDF
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
PDF
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
PPTX
Transforming Mining & Engineering Operations with Odoo ERP | Streamline Proje...
SatishKumar2651
 
PPTX
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
PPTX
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
PPTX
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
PPTX
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
PDF
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
PDF
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
PDF
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
PDF
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
PDF
Online Queue Management System for Public Service Offices in Nepal [Focused i...
Rishab Acharya
 
PPTX
Home Care Tools: Benefits, features and more
Third Rock Techkno
 
PPTX
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
PPTX
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
PDF
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
PDF
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
PDF
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
PDF
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
PPTX
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
Transforming Mining & Engineering Operations with Odoo ERP | Streamline Proje...
SatishKumar2651
 
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
Online Queue Management System for Public Service Offices in Nepal [Focused i...
Rishab Acharya
 
Home Care Tools: Benefits, features and more
Third Rock Techkno
 
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 

Geode is Not a Cache, it's an Analytics Engine

  • 1. Geode is Not a Cache, it's an Analytics Engine! By Evan Benoit ([email protected]) and Sharif Ghazzawi ([email protected])
  • 2. U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ Who is Resonate? Marketing and Advertising Technology Company Located in Reston, VA Give our clients insights into their customers’ values and motivations Hiring Spring and Big Data Engineers! http://www.resonate.com 2
  • 3. U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ What Kind of Data Do We Have? We model predictions for thousands of different attributes • Likes, Dislikes, Motivations, Behavior, Sentiments 3 1000’s of attributes 200 million cookies 1.7 trillion total predictions! 1000’s of sites 21.9 billion total site hits
  • 4. U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ What Do We Do With the Data? Our SaaS platform computes thousands of insights for our clients’ sites • Example: “How many cookies hit my homepage yesterday that we’ve modeled as female democrats, and how does that compare to the general population?” 4 Women Dems Home page Women Dems
  • 5. U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ 5 Each Insights Report requires thousands of set operations to be performed ad hoc, within seconds!
  • 6. U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ Key Take-aways Geode can be used as more than a simple Key-Value cache; it can run functions on data in-memory. Probabilistic Data Structures can be used in many industries to perform set operations at scale. A Spring/Geode architecture can improve performance and scalability. 6
  • 7. U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ What Didn’t Work? HBase Brute force approach: HBase co-processors sequentially scanning bitmaps Completely inappropriate use of HBase! 40-node cluster, 30 second queries Essentially using HBase as an in-memory database 7 1000’s of attributes 200 million cookies 1000’s of sites Sequential scan Sequential scan Sequential scan
  • 8. U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ Probabilistic Data Structures for Estimating Cardinality of a Set We have a counting problem. You probably do, too. Our users don’t require exact precision. We’re not a bank! Probabilistic data structures can estimate the cardinality of a set Data uses in fixed amount of Time and Space 8
  • 9. U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ Yahoo Theta Sketch Yahoo’s Theta Sketches give you estimated counts in a fixed amount of space… … and they also support set operations!! 9 Example from https://datasketches.github.io/docs/Theta/ThetaJavaExample.html
  • 10. U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ Sketches Begin Multiplying Like Rabbits Sketches can’t contain any additional metadata We need a sketch for each attribute, for each tag Next thing we know, we have 150 Million sketches, 2 Terabytes total We need a place to store all these sketches 11 Example from https://datasketches.github.io/docs/Theta/ThetaJavaExample.html 1000’s of attributes 1000’s of sites
  • 11. U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ We Need a Distributed In-memory Database…
  • 12. U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ System Architecture
  • 13. U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ System Characteristics Data Locality • We register Java methods built with the Theta Sketch library into Geode • These set operations run close to the data. No need to shuffle data between nodes. The sketches never leave Geode; Geode just returns the final count. Performance • Computing the cardinality of a set is now an O(1) lookup instead of O(n) full table scan • Output of a set operation is a sketch rather than a number, allowing multiple set operations to be chained together efficiently 14
  • 14. U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ System Characteristics Fault Tolerance/Resiliency • Geode Locators and Servers can be added/removed with zero downtime • AWS Elastic Load Balancer (ELB) detects when a Geode ECS node is unhealthy, kills the Docker container, spawns a new one • Nodes are distributed across multiple AWS Availability Zones Scalability • Just add more servers and rebalance 15
  • 15. U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ Geode Regions Geode gives a lot of options for how to persist and replicate your data Original design called for persistent, replicated, partitioned Geode regions • But persistence and replication made it difficult to swap out bad Geode nodes • It checks filesystem to ensure that no data was lost – Slow! • Data is shuffled to honor the replication config – Slow! Solution: We use AWS S3 as our persistent, replicated layer, not Geode • Geode reads-through from S3 whenever it doesn’t have the data • We read-through "parcels" containing thousands of sketches instead of individually one at-a-time 16
  • 16. U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ Geode and Docker Geode doesn’t easily support Docker/ECS • Recommended way of starting locators and servers is via Gfsh • Gfsh starts locator/server in the background then exits • Docker container exits/dies once there is no process running in the foreground Solution: We added a dummy foreground process to keep Docker container up 17
  • 17. U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ Geode and ECS Geode Locators keep state on local disk, which is transient in AWS ECS • Don't assume existence of a local disk • Makes it difficult to honor "12 factor app" principles Solution: We deploy and associate Locator docker instances to EC2 nodes with storage 18
  • 18. U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ Geode and Spring Boot Spring-data-geode didn’t fit our production architecture • Initially we tried embedding Geode in Spring Boot • No lifecycle hooks for Spring apps to tap into for heath checks • Makes designing fault tolerance/resiliency and scalability difficult Solution: We run Geode as a standalone process, not embedded in spring 19
  • 19. U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ Geode and Configuration Data Improved configuration management flexibility • Geode comes with a tightly integrated configuration management sub-system • Configs are uploaded to locators, distributed to servers Many organizations already have a configuration management system • e.g. consul, zookeeper, spring-cloud-config We’d love to see Geode’s configuration system be pluggable/swappable 20
  • 20. U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ Testing Distributed Systems As with any distributed system, make sure you understand the consistency, availability and partition-tolerance guarantees provided by your tools, and ultimately your system • Identify what parts of your system will provide redundancy • How does your system respond to various failure scenarios? • Test, Test, Test those scenarios 21
  • 21. U nless otherwise indicated, these slides are © 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ Summary We Deployed a Spring and Geode Architecture Containing Yahoo Theta Sketches Significantly improved our main report’s performance Reduced operating costs by 95% over our previous HBase implementation Increased scalability Simplified operations Increased resiliency 22
  • 22. Resonate is HIRING in RESTON! Spring Engineers Big Data Engineers (Spark, Geode, Hadoop, Kafka) Dev Ops Engineers (AWS) UX Engineers (Ember.js) https://www.resonate.com/about/careers/ #springone@s1p