SlideShare a Scribd company logo
1©   Copyright   2013   Pivotal.   All   rights   reserved. 1©   Copyright   2013   Pivotal.   All   rights   reserved.
Building scalable applications using
Pivotal GemFire/Apache Geode
Yogesh Mahajan
ymahajan@apache.org
2©   Copyright   2013   Pivotal.   All   rights   reserved.
Eliminate  disk  access  in  the  real  time  path
Too  much  I/O
Design  roots  don’t  necessarily  apply  today
• Too  much  focus  on  ACID
• Disk  synchronization  bottlenecks
Buffers  
primarily  
tuned  for  IO
First  write  to  
Log
Second  write  
to  Data  Files
We  Challenge  the  traditional  RDBMS  design  NOT  SQL
3©   Copyright   2013   Pivotal.   All   rights   reserved.
IMDG basic concepts
3
– Distributed  memory  oriented  store  
• KV/Objects  or  JSON
• Queryable,  Indexable and  transactional
– Multiple  storage  models
• Replication,  partitioning  in  memory
• With  synchronous  copies  in  cluster
• Overflow  to  disk  and/or  RDBMS
Handle  thousands  of  concurrent  connections
Synchronous  replication  for  
slow  changing  data
Replicated  
Region
Partition  for  large  data  or  highly  transactional  data
Partitioned  Region
Redundant  copy
– Parallelize  Java  App  logic
– Multiple  failure  detection  schemes
– Dynamic  membership  (elastic)
– Vendors  differentiate  on
• Query  support,  WAN,  events,  etc
Low  latency  for  
thousands  of  
clients
4©   Copyright   2013   Pivotal.   All   rights   reserved.
Key  IMDG  pattern  -­ Distributed  Caching
• Designed  to  work  with  existing  RDBs
– Read  through:  Fetch  from  DB  on  cache  miss
– Write  through:  Reflect  in  cache  IFF  DB  write  succeeds
– Write  behind:  reliable,  in-­order  queue  and  batch  write  to  DB
5©   Copyright   2013   Pivotal.   All   rights   reserved.
Traditional RDB integration can be challenging
Memory  Tables
(1)
DB  WRITER
(2)
(3)
(4)
Memory  Tables
(1)
DB  WRITER
(2)
(3)
(4)
Synchronous  “Write  through”
Single  point  of  bottleneck  and  failure
Not  an  option  for  “Write  heavy”
Complex  2-­phase  commit  protocol
Parallel  recovery  is  difficult
(1)
Queue
(2)
Updates
Asynchronous,  
Batches
DB  Synchronizer
(1)
Queue
(2)
DB  Synchronizer
Updates
Asynchronous  “Write  behind”
Cannot  sustain  high  “write” rates
Queue  may  have  to  be  persistent
Parallel  recovery  is  difficult
6©   Copyright   2013   Pivotal.   All   rights   reserved.
Some IMDG, NoSQL offer ‘Shared nothing persistence’
• Append only operation logs
• Fully parallel
• Zero disk seeks
• But, cluster restart requires log
scan
• Very large volumes pose
challenges
Memory
Tables
Append  only  
Operation  logs
OS  Buffers
LOG  
Compressor
Record1
Record2
Record3
Record1
Record2
Record3
Memory
Tables
Append  only  
Operation  logs
OS  Buffers
LOG  
Compressor
Record1
Record2
Record3
Record1
Record2
Record3
7©   Copyright   2013   Pivotal.   All   rights   reserved.
2004 2008 2014
• Massive  increase  in  data  
volumes
• Falling  margins  per  
transaction
• Increasing  cost  of  IT  
maintenance
• Need  for  elasticity  in  
systems
• Financial  Services  
Providers  (Every  major  
wall  steet bank)
• Department  of  Defense
• Real  Time  response  needs
• Time  to  market  constraints  
• Need  for  flexible  data  
models  across  enterprise
• Distributed  development
• Persistence  +  In-­memory
• Global    data  visibility  needs
• Fast  Ingest  needs  for  data
• Need  to  allow  devices  to  
hook  into  enterprise  data
• Always  on
• Largest  travel  Portal
• Airlines
• Trade  clearing
• Online  gambling
• Largest  Telcos
• Large  mfrers
• Largest  Payroll  processor
• Auto  insurance  giants
• Largest  rail  systems  on  
earth
Hybrid  Transactional
/Analytics  grids
Our  GemFire  Journey  Over  The  Years
8©   Copyright   2013   Pivotal.   All   rights   reserved.
Why  OSS?  Why  Apache?
Ÿ Open  Source  Software  is  fundamentally  changing  buying  patterns
– Developers  have  to  endorse  product  selection  (No  longer  CIO  handshake)
– Community  endorsement  is  key  to  product  visibility
– Open  source  credentials  attract  the  best  developers
– Vendor  credibility  directly  tied  to  street  credibility  of  product
Ÿ Align  with  the  tides  of  history
– Customers  increasingly  asking  to  participate  in  product  development
– Resume  driven  development  forces  customers  to  consider  OSS  products
– Allow  product  development  to  happen  with  full  transparency
Ÿ Apache  is  where  you  go  to  build  Open  Source  street  cred
– Transparent,  meritocracy  which  puts  developers  in  charge
9©   Copyright   2013   Pivotal.   All   rights   reserved.
Geode  Will  Be  A  Significant  Apache  Project
Ÿ Over  a  1000  person  years  invested  into  cutting  edge  R&D
Ÿ 1000+  customers  in  very  demanding  verticals
Ÿ Cutting  edge  use  cases  that  have  shaped  product  thinking
Ÿ Tens  of  thousands  of  distributed,  scaled  up  tests  that  can  randomize  
every  aspect  of  the  product  
Ÿ A  core  technology  team  that  has  stayed  together  since  founding
Ÿ Performance  differentiators  that  are  baked  into  every  aspect  of  the  
product
10©   Copyright   2013   Pivotal.   All   rights   reserved.
Gemfire High  Level  Architecture
11©   Copyright   2013   Pivotal.   All   rights   reserved.
What  makes  it  fast?
Ÿ Minimize  copying
– Clients  dynamically  acquire  partitioning  meta  data  for  single  hop  access
– Avoid  JVM  memory  pools  to  the  extent  possible
Ÿ Minimize  contention  points  ..  avoid  offloading  to  OS  scheduler
– Highly  concurrent  data  structures
– Efficient  data  transmission  – Nagle’s  Algorithm
Ÿ Flexible  consistency  model  
– FIFO  consistency  across  replicas  but  NO  global  ordering  across  threads
– Promote  single  row  transactions  (i.e no  transactions)
12©   Copyright   2013   Pivotal.   All   rights   reserved.
What  makes  it  fast?
Ÿ Avoid  disk  seeks
– Data  kept  in  Memory  – 100  times  faster  than  disk
– Keep  indexes  in  memory,  even  when  data  is  on  disk
– Direct  pointers  to  disk  location  when  offloaded
Ÿ Tiered  Caching
– Eventually  consistent  client  caches
– Avoid  Slow  receiver  problems
Ÿ Partition  and  parallelize  everything
– Data.  Application  processing  (procedures,  callbacks),  queries,  Write  behind,  CQ/Event  processing
13©   Copyright   2013   Pivotal.   All   rights   reserved.
“low touch” Usage Patterns
Simple  template  for  TCServer,  TC,  App  servers
Shared  nothing  persistence,  Global  session  state
HTTP  Session  management
Set  Cache  in  hibernate.cfg.xml
Support  for  query  and  entity  caching
Hibernate  L2  Cache  plugin
Servers  understand  the  memcached wire  protocol
Use  any  memcached clientMemcached protocol
<bean  id="cacheManager"  
class="org.springframework.data.gemfire.support.GemfireCacheManager"Spring  Cache  Abstraction
14©   Copyright   2013   Pivotal.   All   rights   reserved.
A  GemFire  customer  use  case  :  IRCTC
• World’s  second  largest  railway  network,  7000  
stations,  30  million  users,12000  trains
• Longer  queues  at  railway  booking  counters
• Not  able  to  scale  during  peak  hours,  8AM,  10AM  
• System  designed  back  in  2005/2006
• Frequent  downtimes,  more  than  10  mins  delay  to  
book  a  ticket,  or  timeout.
15©   Copyright   2013   Pivotal.   All   rights   reserved.
Old  Architecture
PRS
Oracle
DB
e-­ticketing
Application
on  
72  
Physical
Servers
16©   Copyright   2013   Pivotal.   All   rights   reserved.
Architecture  Using  GemFire
PRS
Oracle
DB
Next  Gen
e-­ticketing
Application  
written  on  EJB  
3.1  and  
deployed
on  
72  Physical
Servers(4  
instances  /  
server).  Oracle  
Web  Logic  used  
DIST
RIB
UTE
D
IN  
ME
MO
RY  
DAT
A  
GRI
D
17©   Copyright   2013   Pivotal.   All   rights   reserved.
Challenges
Ÿ Social  infrastructure  site  
Ÿ Migrating  30  million  registered  users
Ÿ Booking  transaction  checkpoints  because  of  supply  demand  
gaps
Ÿ Journey  Planner,  user  authentication  migration  to  in  memory
Ÿ Capable  of  scaling  up  as  the  demand  increases  in  future.
Ÿ High  number  of  concurrent  users  at  the  peak  times
18©   Copyright   2013   Pivotal.   All   rights   reserved.
Architecture  Using  GemFire
19©   Copyright   2013   Pivotal.   All   rights   reserved.
Architecture  Using  GemFire
20©   Copyright   2013   Pivotal.   All   rights   reserved.
21©   Copyright   2013   Pivotal.   All   rights   reserved.
22©   Copyright   2013   Pivotal.   All   rights   reserved.
Benefits
Ÿ Supports  More  than  200,000  Concurrent  Purchases
Ÿ Provide  Stable  Performance  to  Book  Approximately  150,000  TPH,  Compared  to  
60,000  in  the  Old  System
Ÿ Transformed  Customer  Experience  so  Reservation  Transactions  Complete  in  
Seconds  Instead  of  15  minutes
Ÿ Shifted  Online  Purchasing  From  50%  of  Tickets  Sold  to  65%
Ÿ Boosting  Revenue  Generated  From  E-­ticket  Sales  to  INR600  Million  Daily
Ÿ Capable  of  scaling  up  as  the  demand  increases  in  future.
Ÿ CPU  Usage  during  peak  hours  (Tatkaal) is  less  than  9%
23©   Copyright   2013   Pivotal.   All   rights   reserved.
Roadmap
• HDFS persistence
• Off-heap storage
• Lucene indexes
• Spark integration
• Cloud Foundry service
• DistributedTransactions
…and other ideas from the Geode community!
24©   Copyright   2013   Pivotal.   All   rights   reserved.
25©   Copyright   2013   Pivotal.   All   rights   reserved.
Geode  community
• http://geode.incubator.apache.org
• dev@geode.incubator.apache.org
• user@geode.incubator.apache.org
• http://github.com/apache/incubator-geode
26©   Copyright   2013   Pivotal.   All   rights   reserved.
Our  in-­memory  computing  journey  
• We  started  GemFire team  in  Pune  in  2005,  the  core  team  remains  the  
same  over  the  last  decade
• We  build  a  new  product  out  of  Pune  ,  GemFire XD,  In  memory  
distributed  SQL  with  GemFire and  Apache  Derby.  
• We  are  now  working  on  a  new  initiative,  SnappyData.io,  a  startup  
funded  by  Pivotal,  building  a  product  based  on  
Spark(Streaming/SQL),  GemFire and  Approximate  Query  Engine.  
• And  we  are  hiring
27©   Copyright   2013   Pivotal.   All   rights   reserved.
SnappyData Positioning  (snappydata.io)
Streami
ng  
Analytic
s Probabilistic  
data
Distribut
ed  In-­
Memory  
SQL Deep  
integration  of  
Spark  +  
Gem(?)
Unified  cluster,  AlwaysOn,  Cloud  ready
For  Real  time  analytics
Vision  – Drastically  reduce  the  cost  and  complexity  
in  modern  big  data.  …Using  fraction  of  the  
resources
10X  better  response  time,  drop  resource  cost  10X,
reduce  complexity  10X  
Deep  Scale,  
High  volume
MPP  DB
Integrate  
with

More Related Content

What's hot (11)

PPT
Topic : ISDN(integrated services digital network) part 2
Dr Rajiv Srivastava
 
PDF
CS9222 ADVANCED OPERATING SYSTEMS
Kathirvel Ayyaswamy
 
PDF
Timing synchronization F Ling_v1
Fuyun Ling
 
PPTX
Presentation on GPS (Global Positioning System)
Govt. Girls Polytechnic Meja Allahabad
 
PPTX
Message passing in Distributed Computing Systems
Alagappa Govt Arts College, Karaikudi
 
PPTX
Lidar final ppt
rsarnagat
 
PDF
IP Datagram Structure
Hitesh Mohapatra
 
PPTX
Simulated annealing-global optimization algorithm
Akhil Prabhakar
 
PPTX
CoAP protocol -Internet of Things(iot)
Sabahat Nowreen Shaik
 
PDF
IP Over Satellite
nnmaurya
 
PDF
CS9222 ADVANCED OPERATING SYSTEMS
Kathirvel Ayyaswamy
 
Topic : ISDN(integrated services digital network) part 2
Dr Rajiv Srivastava
 
CS9222 ADVANCED OPERATING SYSTEMS
Kathirvel Ayyaswamy
 
Timing synchronization F Ling_v1
Fuyun Ling
 
Presentation on GPS (Global Positioning System)
Govt. Girls Polytechnic Meja Allahabad
 
Message passing in Distributed Computing Systems
Alagappa Govt Arts College, Karaikudi
 
Lidar final ppt
rsarnagat
 
IP Datagram Structure
Hitesh Mohapatra
 
Simulated annealing-global optimization algorithm
Akhil Prabhakar
 
CoAP protocol -Internet of Things(iot)
Sabahat Nowreen Shaik
 
IP Over Satellite
nnmaurya
 
CS9222 ADVANCED OPERATING SYSTEMS
Kathirvel Ayyaswamy
 

Viewers also liked (7)

PPTX
ApexMeetup Geode - Talk1 2016-03-17
Apache Apex Organizer
 
PPTX
GemFire In Memory Data Grid
Dmitry Buzdin
 
PPTX
Open Sourcing GemFire - Apache Geode
Apache Geode
 
PPTX
An Introduction to Apache Geode (incubating)
Anthony Baker
 
PDF
GemFire Data Fabric: Extrema performance e throughput transacional com alta d...
Fred Melo
 
PPTX
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData
 
PDF
Apache Geode Meetup, London
Apache Geode
 
ApexMeetup Geode - Talk1 2016-03-17
Apache Apex Organizer
 
GemFire In Memory Data Grid
Dmitry Buzdin
 
Open Sourcing GemFire - Apache Geode
Apache Geode
 
An Introduction to Apache Geode (incubating)
Anthony Baker
 
GemFire Data Fabric: Extrema performance e throughput transacional com alta d...
Fred Melo
 
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData
 
Apache Geode Meetup, London
Apache Geode
 
Ad

Similar to Building Scalable Applications using Pivotal Gemfire/Apache Geode (20)

PDF
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
In-Memory Computing Summit
 
PPTX
Geode Meetup Apachecon
upthewaterspout
 
PPTX
Building Highly Scalable Spring Applications using In-Memory Data Grids
John Blum
 
PDF
Gemfire Introduction
VMware Tanzu Korea
 
PPTX
Introducing Apache Geode and Spring Data GemFire
John Blum
 
PDF
Implementing a highly scalable stock prediction system with R, Geode, SpringX...
William Markito Oliveira
 
POTX
Building Effective Apache Geode Applications with Spring Data GemFire
John Blum
 
PDF
Scale Out Your Big Data Apps: The Latest on Pivotal GemFire and GemFire XD
VMware Tanzu
 
PDF
Introduction to Apache Geode (Cork, Ireland)
Anthony Baker
 
PDF
Pivotal's effort on Apache Geode
Apache Apex
 
PDF
China Railways Corporation Case Study
VMware Tanzu
 
PPT
Wmware NoSQL
Murat Çakal
 
PDF
Spring Data (GemFire) Overview
John Blum
 
PDF
5 Tips for Getting Started with Pivotal GemFire
VMware Tanzu
 
PDF
Apache Geode Meetup, Cork, Ireland at CIT
Apache Geode
 
PDF
20080611accel
Jeff Hammerbacher
 
PDF
Tweaking performance on high-load projects
Dmitriy Dumanskiy
 
PDF
Real Time Business Platform by Ivan Novick from Pivotal
VMware Tanzu Korea
 
PDF
The next-phase-of-distributed-systems-with-apache-ignite
Dani Traphagen
 
PDF
Pivotal gem fire_twp_distributed-main-memory-platform_042313
EMC
 
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
In-Memory Computing Summit
 
Geode Meetup Apachecon
upthewaterspout
 
Building Highly Scalable Spring Applications using In-Memory Data Grids
John Blum
 
Gemfire Introduction
VMware Tanzu Korea
 
Introducing Apache Geode and Spring Data GemFire
John Blum
 
Implementing a highly scalable stock prediction system with R, Geode, SpringX...
William Markito Oliveira
 
Building Effective Apache Geode Applications with Spring Data GemFire
John Blum
 
Scale Out Your Big Data Apps: The Latest on Pivotal GemFire and GemFire XD
VMware Tanzu
 
Introduction to Apache Geode (Cork, Ireland)
Anthony Baker
 
Pivotal's effort on Apache Geode
Apache Apex
 
China Railways Corporation Case Study
VMware Tanzu
 
Wmware NoSQL
Murat Çakal
 
Spring Data (GemFire) Overview
John Blum
 
5 Tips for Getting Started with Pivotal GemFire
VMware Tanzu
 
Apache Geode Meetup, Cork, Ireland at CIT
Apache Geode
 
20080611accel
Jeff Hammerbacher
 
Tweaking performance on high-load projects
Dmitriy Dumanskiy
 
Real Time Business Platform by Ivan Novick from Pivotal
VMware Tanzu Korea
 
The next-phase-of-distributed-systems-with-apache-ignite
Dani Traphagen
 
Pivotal gem fire_twp_distributed-main-memory-platform_042313
EMC
 
Ad

More from imcpune (6)

PDF
Art of Disorderly Programming
imcpune
 
PDF
In-Memory Computing, Storage & Analysis: Apache Apex + Apache Geode
imcpune
 
PDF
NVM & Implications on Data Infratsructure
imcpune
 
PDF
Data streaming-systems
imcpune
 
PDF
SAP HANA: Enterprise Data Management Meets High Performance Enterprise Computing
imcpune
 
PDF
In-Memory Computing in Modern Data Architecture
imcpune
 
Art of Disorderly Programming
imcpune
 
In-Memory Computing, Storage & Analysis: Apache Apex + Apache Geode
imcpune
 
NVM & Implications on Data Infratsructure
imcpune
 
Data streaming-systems
imcpune
 
SAP HANA: Enterprise Data Management Meets High Performance Enterprise Computing
imcpune
 
In-Memory Computing in Modern Data Architecture
imcpune
 

Recently uploaded (20)

PDF
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
PDF
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
PDF
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
PPTX
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PPTX
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
PPTX
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
PDF
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
PDF
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
PPTX
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
PDF
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
PDF
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
PPT
AI Future trends and opportunities_oct7v1.ppt
SHIKHAKMEHTA
 
PDF
Research Methodology Overview Introduction
ayeshagul29594
 
PDF
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
PPT
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
PDF
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PPTX
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
PPTX
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
AI Future trends and opportunities_oct7v1.ppt
SHIKHAKMEHTA
 
Research Methodology Overview Introduction
ayeshagul29594
 
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 

Building Scalable Applications using Pivotal Gemfire/Apache Geode

  • 1. 1©   Copyright   2013   Pivotal.   All   rights   reserved. 1©   Copyright   2013   Pivotal.   All   rights   reserved. Building scalable applications using Pivotal GemFire/Apache Geode Yogesh Mahajan [email protected]
  • 2. 2©   Copyright   2013   Pivotal.   All   rights   reserved. Eliminate  disk  access  in  the  real  time  path Too  much  I/O Design  roots  don’t  necessarily  apply  today • Too  much  focus  on  ACID • Disk  synchronization  bottlenecks Buffers   primarily   tuned  for  IO First  write  to   Log Second  write   to  Data  Files We  Challenge  the  traditional  RDBMS  design  NOT  SQL
  • 3. 3©   Copyright   2013   Pivotal.   All   rights   reserved. IMDG basic concepts 3 – Distributed  memory  oriented  store   • KV/Objects  or  JSON • Queryable,  Indexable and  transactional – Multiple  storage  models • Replication,  partitioning  in  memory • With  synchronous  copies  in  cluster • Overflow  to  disk  and/or  RDBMS Handle  thousands  of  concurrent  connections Synchronous  replication  for   slow  changing  data Replicated   Region Partition  for  large  data  or  highly  transactional  data Partitioned  Region Redundant  copy – Parallelize  Java  App  logic – Multiple  failure  detection  schemes – Dynamic  membership  (elastic) – Vendors  differentiate  on • Query  support,  WAN,  events,  etc Low  latency  for   thousands  of   clients
  • 4. 4©   Copyright   2013   Pivotal.   All   rights   reserved. Key  IMDG  pattern  -­ Distributed  Caching • Designed  to  work  with  existing  RDBs – Read  through:  Fetch  from  DB  on  cache  miss – Write  through:  Reflect  in  cache  IFF  DB  write  succeeds – Write  behind:  reliable,  in-­order  queue  and  batch  write  to  DB
  • 5. 5©   Copyright   2013   Pivotal.   All   rights   reserved. Traditional RDB integration can be challenging Memory  Tables (1) DB  WRITER (2) (3) (4) Memory  Tables (1) DB  WRITER (2) (3) (4) Synchronous  “Write  through” Single  point  of  bottleneck  and  failure Not  an  option  for  “Write  heavy” Complex  2-­phase  commit  protocol Parallel  recovery  is  difficult (1) Queue (2) Updates Asynchronous,   Batches DB  Synchronizer (1) Queue (2) DB  Synchronizer Updates Asynchronous  “Write  behind” Cannot  sustain  high  “write” rates Queue  may  have  to  be  persistent Parallel  recovery  is  difficult
  • 6. 6©   Copyright   2013   Pivotal.   All   rights   reserved. Some IMDG, NoSQL offer ‘Shared nothing persistence’ • Append only operation logs • Fully parallel • Zero disk seeks • But, cluster restart requires log scan • Very large volumes pose challenges Memory Tables Append  only   Operation  logs OS  Buffers LOG   Compressor Record1 Record2 Record3 Record1 Record2 Record3 Memory Tables Append  only   Operation  logs OS  Buffers LOG   Compressor Record1 Record2 Record3 Record1 Record2 Record3
  • 7. 7©   Copyright   2013   Pivotal.   All   rights   reserved. 2004 2008 2014 • Massive  increase  in  data   volumes • Falling  margins  per   transaction • Increasing  cost  of  IT   maintenance • Need  for  elasticity  in   systems • Financial  Services   Providers  (Every  major   wall  steet bank) • Department  of  Defense • Real  Time  response  needs • Time  to  market  constraints   • Need  for  flexible  data   models  across  enterprise • Distributed  development • Persistence  +  In-­memory • Global    data  visibility  needs • Fast  Ingest  needs  for  data • Need  to  allow  devices  to   hook  into  enterprise  data • Always  on • Largest  travel  Portal • Airlines • Trade  clearing • Online  gambling • Largest  Telcos • Large  mfrers • Largest  Payroll  processor • Auto  insurance  giants • Largest  rail  systems  on   earth Hybrid  Transactional /Analytics  grids Our  GemFire  Journey  Over  The  Years
  • 8. 8©   Copyright   2013   Pivotal.   All   rights   reserved. Why  OSS?  Why  Apache? Ÿ Open  Source  Software  is  fundamentally  changing  buying  patterns – Developers  have  to  endorse  product  selection  (No  longer  CIO  handshake) – Community  endorsement  is  key  to  product  visibility – Open  source  credentials  attract  the  best  developers – Vendor  credibility  directly  tied  to  street  credibility  of  product Ÿ Align  with  the  tides  of  history – Customers  increasingly  asking  to  participate  in  product  development – Resume  driven  development  forces  customers  to  consider  OSS  products – Allow  product  development  to  happen  with  full  transparency Ÿ Apache  is  where  you  go  to  build  Open  Source  street  cred – Transparent,  meritocracy  which  puts  developers  in  charge
  • 9. 9©   Copyright   2013   Pivotal.   All   rights   reserved. Geode  Will  Be  A  Significant  Apache  Project Ÿ Over  a  1000  person  years  invested  into  cutting  edge  R&D Ÿ 1000+  customers  in  very  demanding  verticals Ÿ Cutting  edge  use  cases  that  have  shaped  product  thinking Ÿ Tens  of  thousands  of  distributed,  scaled  up  tests  that  can  randomize   every  aspect  of  the  product   Ÿ A  core  technology  team  that  has  stayed  together  since  founding Ÿ Performance  differentiators  that  are  baked  into  every  aspect  of  the   product
  • 10. 10©   Copyright   2013   Pivotal.   All   rights   reserved. Gemfire High  Level  Architecture
  • 11. 11©   Copyright   2013   Pivotal.   All   rights   reserved. What  makes  it  fast? Ÿ Minimize  copying – Clients  dynamically  acquire  partitioning  meta  data  for  single  hop  access – Avoid  JVM  memory  pools  to  the  extent  possible Ÿ Minimize  contention  points  ..  avoid  offloading  to  OS  scheduler – Highly  concurrent  data  structures – Efficient  data  transmission  – Nagle’s  Algorithm Ÿ Flexible  consistency  model   – FIFO  consistency  across  replicas  but  NO  global  ordering  across  threads – Promote  single  row  transactions  (i.e no  transactions)
  • 12. 12©   Copyright   2013   Pivotal.   All   rights   reserved. What  makes  it  fast? Ÿ Avoid  disk  seeks – Data  kept  in  Memory  – 100  times  faster  than  disk – Keep  indexes  in  memory,  even  when  data  is  on  disk – Direct  pointers  to  disk  location  when  offloaded Ÿ Tiered  Caching – Eventually  consistent  client  caches – Avoid  Slow  receiver  problems Ÿ Partition  and  parallelize  everything – Data.  Application  processing  (procedures,  callbacks),  queries,  Write  behind,  CQ/Event  processing
  • 13. 13©   Copyright   2013   Pivotal.   All   rights   reserved. “low touch” Usage Patterns Simple  template  for  TCServer,  TC,  App  servers Shared  nothing  persistence,  Global  session  state HTTP  Session  management Set  Cache  in  hibernate.cfg.xml Support  for  query  and  entity  caching Hibernate  L2  Cache  plugin Servers  understand  the  memcached wire  protocol Use  any  memcached clientMemcached protocol <bean  id="cacheManager"   class="org.springframework.data.gemfire.support.GemfireCacheManager"Spring  Cache  Abstraction
  • 14. 14©   Copyright   2013   Pivotal.   All   rights   reserved. A  GemFire  customer  use  case  :  IRCTC • World’s  second  largest  railway  network,  7000   stations,  30  million  users,12000  trains • Longer  queues  at  railway  booking  counters • Not  able  to  scale  during  peak  hours,  8AM,  10AM   • System  designed  back  in  2005/2006 • Frequent  downtimes,  more  than  10  mins  delay  to   book  a  ticket,  or  timeout.
  • 15. 15©   Copyright   2013   Pivotal.   All   rights   reserved. Old  Architecture PRS Oracle DB e-­ticketing Application on   72   Physical Servers
  • 16. 16©   Copyright   2013   Pivotal.   All   rights   reserved. Architecture  Using  GemFire PRS Oracle DB Next  Gen e-­ticketing Application   written  on  EJB   3.1  and   deployed on   72  Physical Servers(4   instances  /   server).  Oracle   Web  Logic  used   DIST RIB UTE D IN   ME MO RY   DAT A   GRI D
  • 17. 17©   Copyright   2013   Pivotal.   All   rights   reserved. Challenges Ÿ Social  infrastructure  site   Ÿ Migrating  30  million  registered  users Ÿ Booking  transaction  checkpoints  because  of  supply  demand   gaps Ÿ Journey  Planner,  user  authentication  migration  to  in  memory Ÿ Capable  of  scaling  up  as  the  demand  increases  in  future. Ÿ High  number  of  concurrent  users  at  the  peak  times
  • 18. 18©   Copyright   2013   Pivotal.   All   rights   reserved. Architecture  Using  GemFire
  • 19. 19©   Copyright   2013   Pivotal.   All   rights   reserved. Architecture  Using  GemFire
  • 20. 20©   Copyright   2013   Pivotal.   All   rights   reserved.
  • 21. 21©   Copyright   2013   Pivotal.   All   rights   reserved.
  • 22. 22©   Copyright   2013   Pivotal.   All   rights   reserved. Benefits Ÿ Supports  More  than  200,000  Concurrent  Purchases Ÿ Provide  Stable  Performance  to  Book  Approximately  150,000  TPH,  Compared  to   60,000  in  the  Old  System Ÿ Transformed  Customer  Experience  so  Reservation  Transactions  Complete  in   Seconds  Instead  of  15  minutes Ÿ Shifted  Online  Purchasing  From  50%  of  Tickets  Sold  to  65% Ÿ Boosting  Revenue  Generated  From  E-­ticket  Sales  to  INR600  Million  Daily Ÿ Capable  of  scaling  up  as  the  demand  increases  in  future. Ÿ CPU  Usage  during  peak  hours  (Tatkaal) is  less  than  9%
  • 23. 23©   Copyright   2013   Pivotal.   All   rights   reserved. Roadmap • HDFS persistence • Off-heap storage • Lucene indexes • Spark integration • Cloud Foundry service • DistributedTransactions …and other ideas from the Geode community!
  • 24. 24©   Copyright   2013   Pivotal.   All   rights   reserved.
  • 25. 25©   Copyright   2013   Pivotal.   All   rights   reserved. Geode  community • http://geode.incubator.apache.org • [email protected][email protected] • http://github.com/apache/incubator-geode
  • 26. 26©   Copyright   2013   Pivotal.   All   rights   reserved. Our  in-­memory  computing  journey   • We  started  GemFire team  in  Pune  in  2005,  the  core  team  remains  the   same  over  the  last  decade • We  build  a  new  product  out  of  Pune  ,  GemFire XD,  In  memory   distributed  SQL  with  GemFire and  Apache  Derby.   • We  are  now  working  on  a  new  initiative,  SnappyData.io,  a  startup   funded  by  Pivotal,  building  a  product  based  on   Spark(Streaming/SQL),  GemFire and  Approximate  Query  Engine.   • And  we  are  hiring
  • 27. 27©   Copyright   2013   Pivotal.   All   rights   reserved. SnappyData Positioning  (snappydata.io) Streami ng   Analytic s Probabilistic   data Distribut ed  In-­ Memory   SQL Deep   integration  of   Spark  +   Gem(?) Unified  cluster,  AlwaysOn,  Cloud  ready For  Real  time  analytics Vision  – Drastically  reduce  the  cost  and  complexity   in  modern  big  data.  …Using  fraction  of  the   resources 10X  better  response  time,  drop  resource  cost  10X, reduce  complexity  10X   Deep  Scale,   High  volume MPP  DB Integrate   with