SlideShare a Scribd company logo
The Magic and the Mystery of In-Memory Apps
Taufik Ma – Industry Insight
Shaun Walsh - Marketeer
2 © 2015 G2M COMMUNICATIONS. All rights reserved.
Contents
 The Use In Memory Applications?
 Evolution towards & Role of In-Memory Computing
 Role of Storage in In-memory solutions
 Customer Trends
 Emerging Technologies & Some Predictions
 Summary
Magic and In-Memory Applications
Shaun Walsh - Marketeer
Are your ready for in memory applications?
5 © 2015 G2M COMMUNICATIONS. All rights reserved.
The Evolution of Storage Tiers
NVM will Accelerate Both Meta-Data & Application Data
6 © 2015 G2M COMMUNICATIONS. All rights reserved.
NVDIMM Acceleration Segments
Latency
Meta Data
Acceleration
NVDIMM Type Presentation Access Method Latency
-N DRAM Byte Consistent
-F Storage Block Variable
-P DRAM and/or Storage Byte & Block Variable
Meta Data
Acceleration
NVDIMM-N
NVDIMM-F
NVDIMM-P
3D-XPoint
• Data Base Log Files
• Clustering
• Cache Synchronization
• In-Memory DBs
• MemCacheD
• RAID
• De-Dupe
7 © 2015 G2M COMMUNICATIONS. All rights reserved.
NVM-DIMM – fills growing DRAM-NAND gap
 In Memory Applications are
driving a new class of Storage
Class Memory (SMC)
 Latency and persistence are as
important as absolute bandwidth
 Byte and Block address flexibility
is vital to scaling In-Memory
Applications (IMA)
7
Persistent
Memory
8 © 2015 G2M COMMUNICATIONS. All rights reserved.
The Future of Business Intelligence
Latency and Persistence are the new value currency for real-time applications & storage
• Old performance was data rates (GB/s) &
capacity (TB)
• Store Everything, Sort Later
• Higher Cost, Slow Decisions
Latency & PersistenceBandwidth & Capacity
• Real-Time is Business Critical
• Major Players Driving NMV
• Store the Vital & Analyze now
Latency & Persistence
9 © 2015 G2M COMMUNICATIONS. All rights reserved.
Procter & Gamble - Real-Time Reporting & Business Decisions
https://hana.sap.com/abouthana/customer-stories/pg.html
35,000
Retail, supply chain
and business users
supported
400%
Increase in decision
support systems
performance
55%
Reduced database
from 36TB to 16TB
all in memory
P&G achieved faster, more reliable reporting and analytics
McLaren Group – Faster Formula 1
• Faster and more consistent lap times
• Improved down force for better grip
• Real-time telemetric analysis
• More World Championships
The Art and Science of In Memory Applications
Taufik Ma
Industry Insight
12 © 2015 G2M COMMUNICATIONS. All rights reserved.
Evolution of Databases & Analytics
1980s 1990s 2000s-2015
RDBMS
EDW/OLAP
RDBMS
Operational
(OLTP, ERM)
Data Warehousing
(Data mining, DSS, Analytics)
RDBMS
NoSQL
Hadoop
EDW/OLAP
Oracle, MS SQL, Sybase
Teradata, Oracle,
SAS, etc
MongoDB
Cassandra
MapReduce
HBase
MySQL
Postgres
IBM Netezza
EMC Greenplum
13 © 2015 G2M COMMUNICATIONS. All rights reserved.
RDBMS NoSQL
EDW/OLAP Hadoop
Structured Data,
Relational
Unstructured,
Schema-less
Real-time,
Online
Operations
Batch,
Offline
Analytics
Ongoing Evolution & Specialization…
OLTP, ERM
Purchases, clicks
User profiles, reviews
Content Management
User Segmentation
Daily offer recommendation
Ad serving engine
Fraud Detection
14 © 2015 G2M COMMUNICATIONS. All rights reserved.
RDBMS NoSQL
EDW/OLAP Hadoop
Structured Data,
Relational
Unstructured,
Schema-less
Real-time,
Online
Operations
Batch,
Offline
Analytics
Ongoing Evolution & Specialization…
In-Memory Database
Hana, Exalytics, MemSQL, etc
In-Mem Data Processing
Spark, Hadoop in-mem
Real-time
analytics
OLTP, ERM
Purchases, clicks
User profiles, reviews
Content Management
Financial risk/value analysis
Fraud Prevention
Real-time recommendations
Profitability analysis
User Segmentation
Daily offer recommendation
Ad serving engine
Fraud Detection
15 © 2015 G2M COMMUNICATIONS. All rights reserved.
Multiple Tools Within A Customer
Customer Profiles (G2M Survey)
$500M+ Retail $500M+
Pharma
$1B+
Manufacturing
$1B+ Pharma $1B+ SaaS $250M+
Healthcare
Hadoop Yes Yes Yes Yes Yes Yes
MongoDB Yes No plans Yes Yes No plans
Spark Yes
No plans
Considering
Yes, in 6
months Yes Yes, in 6 months
SAP HANA No plans Yes Considering Yes No plans Considering
Microsoft
Hekaton No plans No plans Considering
Yes, in 6
months No plans Yes, in 12 months
memSQL No plans No plans Considering
Yes, in 6
months No plans
Yes, in 12+
months
Oracle Exalytics No plans No plans Yes Yes No plans
Yes, in 12+
months
“Specialized Tools for Specific Needs”
(Or “Too Many Data Islands”?)
16 © 2015 G2M COMMUNICATIONS. All rights reserved.
Multiple In-Memory Applications within a Customer
How many in-memory applications do you (or will you) run?
1-5 6-10 More than 10
17 © 2015 G2M COMMUNICATIONS. All rights reserved.
Key Enabler of In-Memory Computing:
Today’s Technologies
On a human scale…
If I complete 50 operations in
50 seconds, then have to wait
for data…
Time to get
data
CPU L1
cache
0.001 usec
DRAM 0.01 usec
NAND 100 usec
HDD 10,000 usec
DRAM = getting food from
the fridge (10’s of seconds)
NAND = taking the day off
HDDs = hiking the
Pacific Coast Trail
(months)
18 © 2015 G2M COMMUNICATIONS. All rights reserved.
Performance Comes at a Price
Storage Time to
get data
Price / GB Cost for 100TB # 2U Servers Req’d to
Hold 100TB*
DRAM 0.01 usec $5.60
32G DIMM for $179 ea,
Samsung Registered DDR4,
M393A4K40BB0-CPB0
$560,000
3125 x 32G DIMMs
130
NAND 100 usec $0.35
2.5” 1TB SSD, $350 ea, Intel
540S
$35,000
100 x 2.5” 1TB SSD
5
HDD 10,000
usec
$0.03
3.5” 4TB SATA HDD for $120 ea,
Seagate ST4000DM000
$3,000
25 x 3.5” 4TB SATA HDD
2-3
* Assuming 24 DIMM slots, 24x 2.5” drives or 12x 3.5” drives
19 © 2015 G2M COMMUNICATIONS. All rights reserved.
Location of Data & Tasks
Input
File
Chunks
1 2 3
Hadoop: MapReduce / HDFS
Parallel
Tasks
DISK 1
Parallel
Tasks
Parallel
Tasks
2 3
Input
File
Partitions (RDDs)
1 2 3
Spark / Tachyon
1
Parallel
Tasks
MEM 2
Parallel
Tasks
3
Parallel
Tasks
Input
File
User Partitioning
1 2
SAP Hana
1
Local
Tasks
MEM 2
Local
Tasks
Master Slave(s) StandbyJobTracker /
Name Node
Sends tasks to data nodes
Spark
Driver
Sends tasks to worker nodes
20 © 2015 G2M COMMUNICATIONS. All rights reserved.
Surviving Failures
Input
Files
Chunks
1 2 3
Hadoop: MapReduce / HDFS
Parallel
Tasks
DISK
1 3
Parallel
Tasks
Parallel
Tasks
2 3 3 2
Input
Files
To persistent
storage
1 2 3
Spark / Tachyon
1
Parallel
Tasks
MEM 2
Parallel
Tasks
3
Parallel
Tasks
Input
Files
User Partitioning
1 2
SAP Hana
1
Local
Tasks
MEM 2
Local
Tasks
Ext
Storage
Logs &
savepoints
Lineage: Record of
transformations that created an
RDD from its “parent”
3-fold Replication
2 1 1
Lineage
Master Slave(s) Standby
& checkpoints
Partitions (RDDs)
21 © 2015 G2M COMMUNICATIONS. All rights reserved.
No such thing as 100% In-Memory
a b c
Input
Files
Chunks
1 2 3
Hadoop: MapReduce / HDFS
1
Parallel
Tasks
DISK*
2 3
SSD
RAM_
DISK
a
2
Parallel
Tasks
a
3
Parallel
Tasks
a
1 3 2 1
a b c
Input
Files
Partitions (RDDs)
1 2 3
Spark / Tachyon
1
Parallel
Tasks
HDD
SSD
MEM
a
2
Parallel
Tasks
a
3
Parallel
Tasks
a
HDFS2.0 Heterogeneous Storage
Storage Types & Policies
Files/directories assigned policies
(e.g. Lazy_persist, All_SSD)
Tachyon Tiered Storage
(for Off_heap Spark RDDs)
Auto or manual
a b c
* ARCHIVE tier not shown
a b cb a ac c b
a b
Input
Files
User Partitioning
1 2
SAP Hana
1
Local
Tasks
MEM 2
Local
Tasks
Ext
Storage
a b
Logs &
savepoints
Caching
WARM:
Primary
image on Disk
HOT:
Primary image
in Mem
SAP HANA Dynamic Tiering
Data spec’d as either Hot or Warm
22 © 2015 G2M COMMUNICATIONS. All rights reserved.
Customer In-Memory Computing Trends (based on G2M survey)
• Cluster sizes similar to
big data solutions
o ½ respondents > 500
servers, 1/3 at >50
o And not just for Spark
• With datasets that fit
available DRAM capacity
o 1/3 at >100TB, 1/3 at >10TB
~Half with 10-20%+/yr dataset
growth
Majority use/want tier-ing
when dataset > DRAM
Only minority would rely on scale-
out only
Mixed on whether tier-ing
should be transparent or not
Some want it transparent to
developer; Rest want developer to
have control via policy
• ~Half believe “my storage
capacity forces me to
have more compute
capacity then I need”
• Majority have or have
plans for consolidated
data silos
o OLTP+IMDB,
Spark+Hadoop,
NoSQL+Hadoop
SIZE GROWTH EFFICIENCY
23 © 2015 G2M COMMUNICATIONS. All rights reserved.
Emerging Technologies: High-speed Fabrics & Disaggregated Storage
 Ethernet or PCIe based fabric
 DAS-like performance Local or SAN
 Map any drive to any host
 Scale each storage tier separately
from compute
 Early proof points: EMC DSSD,
SanDisk InfiniFlash, DriveScaleHDD
NAND
DRAM
Low latency fabric
CPU CPU CPU CPU CPU CPU
50G
40G
25G
10G
100G
…
 Data Center Ethernet speeds ramping faster
than drive speeds: 10/25/40/50/100G
 RDMA-over-Ethernet technologies
 Multi-host PCIe fabrics emerging (e.g. OCP
Lightning) albeit w/ less scalability
SATA/SAS
NVMe PCIeX4 Gen3
time
24 © 2015 G2M COMMUNICATIONS. All rights reserved.
Emerging Technologies: Storage Class Memory
Storage Persist-
ence
Time to
access data
Price / GB Cost for
100TB
# 2U Servers
Req’d to Hold
100TB*
DRAM N 10ns+ $5.60 $560,000
3125 x 32G DIMMs
130
NV-DIMM -N Y 10ns+ $10+
If 2X+ DRAM
$1,000,000+ 260
16G NVDIMM, supercap
3DXP DIMM 100ns Rd
500ns Wr
$2+
If 1/3+ DRAM
$190,000+ ~50
assuming 96 or 128GB
DIMMs
NAND Y 100 usec $0.35
2.5” 1TB SSD, $350 ea, Intel 540S
$35,000
100 x 2.5” 1TB SSD
5
HDD Y 10,000 usec $0.03
3.5” 4TB SATA HDD for $120 ea,
Seagate ST4000DM000
$3,000
25 x 3.5” 4TB SATA HDD
2-3
* Assuming 24 DIMM slots, 24x 2.5” drives or 12x 3.5” drives
25 © 2015 G2M COMMUNICATIONS. All rights reserved.
In-Memory Computing Predictions / Trends
1. 3DXP DIMMs used for “Jumbo Memory” – value in lower $/GB vs DRAM, not persistence
– Mix of 3DXP & DRAM DIMMs in server nodes
– Tier-ing will be tuned to accommodate slower writes & reads
– Spark, In-mem Hadoop, MemSQL, Hana, etc
– NV-DIMM –P might have similar adoption but predictable latency is a concern
2. Increasing use of NVMe SSDs as “Far Memory” – as next tier (below DRAM/3DXP)
– Priority on $/TB, not persistence. Resiliency still via Lineage, logs, etc
– Remove ”last-inch” of latency via BLKB (block-layer/kernel bypass) stacks (e.g. EMC libflood, SPDK)
– Implemented as a fabric-disaggregated cluster to enable efficiency & independent scalability
– Longer-term, HW-based paging of near-memory to far-memory
3. Use of “Persistent Memory” for In-Mem computing will evolve
– For 3DXP & NV-DIMM –N
– Industry progress on pmem file systems (Linux, Windows)
– Does persistence replace or complement lineage/logs?
– Need low latency replication across nodes (PMoF)
26 © 2015 G2M COMMUNICATIONS. All rights reserved.
Summary
 In-memory solutions growing in adoption – driven by real-time analytics
 Co-existence of structured (e.g. Hana) and unstructured frameworks (e.g. Spark)
 Confluence of big-data & real-time analytics drives increasing adoption of tier-ing
 Newer technologies on horizon will continue to create disruptions to in-memory
computing architectures
Are your ready for in memory applications?

More Related Content

What's hot (20)

PPTX
HP: HP 3PAR - Storage zrodený pre virtualizované prostredie
ASBIS SK
 
PDF
IMCSummit 2015 - Day 2 IT Business Track - Drive IMC Efficiency with Flash E...
In-Memory Computing Summit
 
PPTX
3 Par
ramseydellinger
 
PPTX
Consolidation on Flash- Hardware for Nothing, Get Your Flash for Free (I want...
Western Digital
 
PPTX
Four Assumptions Killing Backup Storage Webinar
Storage Switzerland
 
PPTX
HP 3Par StoreServ Storage: HP All Flash Array SSD
Unitiv
 
PDF
HP flash optimized storage - webcast
Calvin Zito
 
PPTX
Hitachi Virtual Storage Platform
mnalls
 
PDF
Aerospike: Enabling Your Digital Transformation
Brillix
 
PPTX
HP Storage: Delivering Storage without Boundaries
jameshub12
 
PDF
IMCSummit 2015 - Day 2 Developer Track - The NVM Revolution
In-Memory Computing Summit
 
PDF
Welcome to the Datasphere – the next level of storage
BOSTON Server & Storage Solutions GmbH
 
PDF
Storage user cases
Andrea Mauro
 
PDF
Optimizing Lustre and GPFS with DDN
inside-BigData.com
 
PDF
DDN Product Update from SC13
inside-BigData.com
 
PDF
Storage, Backup und Business Continuity mit Open-E
BOSTON Server & Storage Solutions GmbH
 
PPTX
Flash for the Real World – Separate Hype from Reality
Hitachi Vantara
 
PDF
IMCSummit 2015 - Day 2 Keynote - In-Memory Computing and the Emergence of Tie...
In-Memory Computing Summit
 
PPTX
DDN GS7K - Easy-to-deploy, High Performance Scale-Out Parallel File System Ap...
inside-BigData.com
 
PDF
Evoluzione dello storage
Andrea Mauro
 
HP: HP 3PAR - Storage zrodený pre virtualizované prostredie
ASBIS SK
 
IMCSummit 2015 - Day 2 IT Business Track - Drive IMC Efficiency with Flash E...
In-Memory Computing Summit
 
Consolidation on Flash- Hardware for Nothing, Get Your Flash for Free (I want...
Western Digital
 
Four Assumptions Killing Backup Storage Webinar
Storage Switzerland
 
HP 3Par StoreServ Storage: HP All Flash Array SSD
Unitiv
 
HP flash optimized storage - webcast
Calvin Zito
 
Hitachi Virtual Storage Platform
mnalls
 
Aerospike: Enabling Your Digital Transformation
Brillix
 
HP Storage: Delivering Storage without Boundaries
jameshub12
 
IMCSummit 2015 - Day 2 Developer Track - The NVM Revolution
In-Memory Computing Summit
 
Welcome to the Datasphere – the next level of storage
BOSTON Server & Storage Solutions GmbH
 
Storage user cases
Andrea Mauro
 
Optimizing Lustre and GPFS with DDN
inside-BigData.com
 
DDN Product Update from SC13
inside-BigData.com
 
Storage, Backup und Business Continuity mit Open-E
BOSTON Server & Storage Solutions GmbH
 
Flash for the Real World – Separate Hype from Reality
Hitachi Vantara
 
IMCSummit 2015 - Day 2 Keynote - In-Memory Computing and the Emergence of Tie...
In-Memory Computing Summit
 
DDN GS7K - Easy-to-deploy, High Performance Scale-Out Parallel File System Ap...
inside-BigData.com
 
Evoluzione dello storage
Andrea Mauro
 

Viewers also liked (15)

PPTX
IOT, Streaming Analytics and Machine Learning
DataWorks Summit/Hadoop Summit
 
PDF
Study: How Germany's Dax 30 companies use PMP, Scrum and Design Thinking
DSP-Partners
 
PDF
Deep Learning at Scale
Mateusz Dymczyk
 
PDF
Thesis_Helgstrand_Jamtander
Oskar Jamtander
 
PPTX
VHSC - Dr. Suraj Chawla
Suraj Chawla
 
PPTX
What is Digital Performance Management?
Ryan Bateman
 
PPTX
Big Data: The 6 Key Skills Every Business Needs
Bernard Marr
 
PPTX
Designing sustainable content using correlation coefficient
Mainak Roy
 
PDF
Redes sociales: preposiciones y alguna proposicion
Iñaki Murua
 
PDF
MKTG217_Joseph Baladi_AY1617T2_28 November
Joseph Baladi
 
ODT
Santiago vazquez joaquin_dpt_tarea_1
Joaquin Santiago Vazquez
 
PDF
創刊準備号 天一国 Weekly News VISION 2020
일교 통
 
PDF
Rx talk
Torsten Muller
 
DOCX
Bosnia
Chloe Lloyd
 
PPTX
Perception Of Sales As Profession
Syed Samie
 
IOT, Streaming Analytics and Machine Learning
DataWorks Summit/Hadoop Summit
 
Study: How Germany's Dax 30 companies use PMP, Scrum and Design Thinking
DSP-Partners
 
Deep Learning at Scale
Mateusz Dymczyk
 
Thesis_Helgstrand_Jamtander
Oskar Jamtander
 
VHSC - Dr. Suraj Chawla
Suraj Chawla
 
What is Digital Performance Management?
Ryan Bateman
 
Big Data: The 6 Key Skills Every Business Needs
Bernard Marr
 
Designing sustainable content using correlation coefficient
Mainak Roy
 
Redes sociales: preposiciones y alguna proposicion
Iñaki Murua
 
MKTG217_Joseph Baladi_AY1617T2_28 November
Joseph Baladi
 
Santiago vazquez joaquin_dpt_tarea_1
Joaquin Santiago Vazquez
 
創刊準備号 天一国 Weekly News VISION 2020
일교 통
 
Bosnia
Chloe Lloyd
 
Perception Of Sales As Profession
Syed Samie
 
Ad

Similar to Are your ready for in memory applications? (20)

PDF
Analytics, Big Data and Nonvolatile Memory Architectures – Why you Should Car...
StampedeCon
 
PPTX
How AI and ML are driving Memory Architecture changes
Danny Sabour
 
PDF
DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...
inside-BigData.com
 
PDF
Ceph's journey at SUSE
Ceph Community
 
PPTX
Live Data: For When Data is Greater than Memory
MemVerge
 
PDF
New Memory Solutions for Enterprise Computing
Intel IT Center
 
PPTX
22by7 and DellEMC Tech Day July 20 2017 - Power Edge
Sashikris
 
PDF
HPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY
 
PPTX
Webinar: The Bifurcation of the Flash Market
Storage Switzerland
 
PDF
Elastify Cloud-Native Spark Application with Persistent Memory
Databricks
 
PPTX
Live CEO Interview and Webinar Update on the State of Deduplication
Storage Switzerland
 
PPTX
Ferri Embedded Storage
Silicon Motion
 
PDF
Hybrid Memory Cube: Developing Scalable and Resilient Memory Systems
MicronTechnology
 
PDF
NVMe and Flash – Make Your Storage Great Again!
DataCore Software
 
PDF
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Sumeet Singh
 
PPTX
Optimizing Flash Storage for SQL Databases
Storage Switzerland
 
PDF
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red_Hat_Storage
 
PPTX
AquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics
 
PDF
Architecting a Scalable Hadoop Platform: Top 10 considerations for success
DataWorks Summit
 
PPTX
Flash Ahead: IBM Flash System Selling Point
CTI Group
 
Analytics, Big Data and Nonvolatile Memory Architectures – Why you Should Car...
StampedeCon
 
How AI and ML are driving Memory Architecture changes
Danny Sabour
 
DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...
inside-BigData.com
 
Ceph's journey at SUSE
Ceph Community
 
Live Data: For When Data is Greater than Memory
MemVerge
 
New Memory Solutions for Enterprise Computing
Intel IT Center
 
22by7 and DellEMC Tech Day July 20 2017 - Power Edge
Sashikris
 
HPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY
 
Webinar: The Bifurcation of the Flash Market
Storage Switzerland
 
Elastify Cloud-Native Spark Application with Persistent Memory
Databricks
 
Live CEO Interview and Webinar Update on the State of Deduplication
Storage Switzerland
 
Ferri Embedded Storage
Silicon Motion
 
Hybrid Memory Cube: Developing Scalable and Resilient Memory Systems
MicronTechnology
 
NVMe and Flash – Make Your Storage Great Again!
DataCore Software
 
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Sumeet Singh
 
Optimizing Flash Storage for SQL Databases
Storage Switzerland
 
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red_Hat_Storage
 
AquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics
 
Architecting a Scalable Hadoop Platform: Top 10 considerations for success
DataWorks Summit
 
Flash Ahead: IBM Flash System Selling Point
CTI Group
 
Ad

Recently uploaded (20)

PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PDF
Market Insight : ETH Dominance Returns
CIFDAQ
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
NewMind AI Weekly Chronicles – July’25, Week III
NewMind AI
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PPTX
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PPTX
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PDF
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
Ivan Ruchkin
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
PPTX
Agentic AI in Healthcare Driving the Next Wave of Digital Transformation
danielle hunter
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
Market Insight : ETH Dominance Returns
CIFDAQ
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
NewMind AI Weekly Chronicles – July’25, Week III
NewMind AI
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
Ivan Ruchkin
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
Agentic AI in Healthcare Driving the Next Wave of Digital Transformation
danielle hunter
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 

Are your ready for in memory applications?

  • 1. The Magic and the Mystery of In-Memory Apps Taufik Ma – Industry Insight Shaun Walsh - Marketeer
  • 2. 2 © 2015 G2M COMMUNICATIONS. All rights reserved. Contents  The Use In Memory Applications?  Evolution towards & Role of In-Memory Computing  Role of Storage in In-memory solutions  Customer Trends  Emerging Technologies & Some Predictions  Summary
  • 3. Magic and In-Memory Applications Shaun Walsh - Marketeer
  • 5. 5 © 2015 G2M COMMUNICATIONS. All rights reserved. The Evolution of Storage Tiers NVM will Accelerate Both Meta-Data & Application Data
  • 6. 6 © 2015 G2M COMMUNICATIONS. All rights reserved. NVDIMM Acceleration Segments Latency Meta Data Acceleration NVDIMM Type Presentation Access Method Latency -N DRAM Byte Consistent -F Storage Block Variable -P DRAM and/or Storage Byte & Block Variable Meta Data Acceleration NVDIMM-N NVDIMM-F NVDIMM-P 3D-XPoint • Data Base Log Files • Clustering • Cache Synchronization • In-Memory DBs • MemCacheD • RAID • De-Dupe
  • 7. 7 © 2015 G2M COMMUNICATIONS. All rights reserved. NVM-DIMM – fills growing DRAM-NAND gap  In Memory Applications are driving a new class of Storage Class Memory (SMC)  Latency and persistence are as important as absolute bandwidth  Byte and Block address flexibility is vital to scaling In-Memory Applications (IMA) 7 Persistent Memory
  • 8. 8 © 2015 G2M COMMUNICATIONS. All rights reserved. The Future of Business Intelligence Latency and Persistence are the new value currency for real-time applications & storage • Old performance was data rates (GB/s) & capacity (TB) • Store Everything, Sort Later • Higher Cost, Slow Decisions Latency & PersistenceBandwidth & Capacity • Real-Time is Business Critical • Major Players Driving NMV • Store the Vital & Analyze now Latency & Persistence
  • 9. 9 © 2015 G2M COMMUNICATIONS. All rights reserved. Procter & Gamble - Real-Time Reporting & Business Decisions https://hana.sap.com/abouthana/customer-stories/pg.html 35,000 Retail, supply chain and business users supported 400% Increase in decision support systems performance 55% Reduced database from 36TB to 16TB all in memory P&G achieved faster, more reliable reporting and analytics
  • 10. McLaren Group – Faster Formula 1 • Faster and more consistent lap times • Improved down force for better grip • Real-time telemetric analysis • More World Championships
  • 11. The Art and Science of In Memory Applications Taufik Ma Industry Insight
  • 12. 12 © 2015 G2M COMMUNICATIONS. All rights reserved. Evolution of Databases & Analytics 1980s 1990s 2000s-2015 RDBMS EDW/OLAP RDBMS Operational (OLTP, ERM) Data Warehousing (Data mining, DSS, Analytics) RDBMS NoSQL Hadoop EDW/OLAP Oracle, MS SQL, Sybase Teradata, Oracle, SAS, etc MongoDB Cassandra MapReduce HBase MySQL Postgres IBM Netezza EMC Greenplum
  • 13. 13 © 2015 G2M COMMUNICATIONS. All rights reserved. RDBMS NoSQL EDW/OLAP Hadoop Structured Data, Relational Unstructured, Schema-less Real-time, Online Operations Batch, Offline Analytics Ongoing Evolution & Specialization… OLTP, ERM Purchases, clicks User profiles, reviews Content Management User Segmentation Daily offer recommendation Ad serving engine Fraud Detection
  • 14. 14 © 2015 G2M COMMUNICATIONS. All rights reserved. RDBMS NoSQL EDW/OLAP Hadoop Structured Data, Relational Unstructured, Schema-less Real-time, Online Operations Batch, Offline Analytics Ongoing Evolution & Specialization… In-Memory Database Hana, Exalytics, MemSQL, etc In-Mem Data Processing Spark, Hadoop in-mem Real-time analytics OLTP, ERM Purchases, clicks User profiles, reviews Content Management Financial risk/value analysis Fraud Prevention Real-time recommendations Profitability analysis User Segmentation Daily offer recommendation Ad serving engine Fraud Detection
  • 15. 15 © 2015 G2M COMMUNICATIONS. All rights reserved. Multiple Tools Within A Customer Customer Profiles (G2M Survey) $500M+ Retail $500M+ Pharma $1B+ Manufacturing $1B+ Pharma $1B+ SaaS $250M+ Healthcare Hadoop Yes Yes Yes Yes Yes Yes MongoDB Yes No plans Yes Yes No plans Spark Yes No plans Considering Yes, in 6 months Yes Yes, in 6 months SAP HANA No plans Yes Considering Yes No plans Considering Microsoft Hekaton No plans No plans Considering Yes, in 6 months No plans Yes, in 12 months memSQL No plans No plans Considering Yes, in 6 months No plans Yes, in 12+ months Oracle Exalytics No plans No plans Yes Yes No plans Yes, in 12+ months “Specialized Tools for Specific Needs” (Or “Too Many Data Islands”?)
  • 16. 16 © 2015 G2M COMMUNICATIONS. All rights reserved. Multiple In-Memory Applications within a Customer How many in-memory applications do you (or will you) run? 1-5 6-10 More than 10
  • 17. 17 © 2015 G2M COMMUNICATIONS. All rights reserved. Key Enabler of In-Memory Computing: Today’s Technologies On a human scale… If I complete 50 operations in 50 seconds, then have to wait for data… Time to get data CPU L1 cache 0.001 usec DRAM 0.01 usec NAND 100 usec HDD 10,000 usec DRAM = getting food from the fridge (10’s of seconds) NAND = taking the day off HDDs = hiking the Pacific Coast Trail (months)
  • 18. 18 © 2015 G2M COMMUNICATIONS. All rights reserved. Performance Comes at a Price Storage Time to get data Price / GB Cost for 100TB # 2U Servers Req’d to Hold 100TB* DRAM 0.01 usec $5.60 32G DIMM for $179 ea, Samsung Registered DDR4, M393A4K40BB0-CPB0 $560,000 3125 x 32G DIMMs 130 NAND 100 usec $0.35 2.5” 1TB SSD, $350 ea, Intel 540S $35,000 100 x 2.5” 1TB SSD 5 HDD 10,000 usec $0.03 3.5” 4TB SATA HDD for $120 ea, Seagate ST4000DM000 $3,000 25 x 3.5” 4TB SATA HDD 2-3 * Assuming 24 DIMM slots, 24x 2.5” drives or 12x 3.5” drives
  • 19. 19 © 2015 G2M COMMUNICATIONS. All rights reserved. Location of Data & Tasks Input File Chunks 1 2 3 Hadoop: MapReduce / HDFS Parallel Tasks DISK 1 Parallel Tasks Parallel Tasks 2 3 Input File Partitions (RDDs) 1 2 3 Spark / Tachyon 1 Parallel Tasks MEM 2 Parallel Tasks 3 Parallel Tasks Input File User Partitioning 1 2 SAP Hana 1 Local Tasks MEM 2 Local Tasks Master Slave(s) StandbyJobTracker / Name Node Sends tasks to data nodes Spark Driver Sends tasks to worker nodes
  • 20. 20 © 2015 G2M COMMUNICATIONS. All rights reserved. Surviving Failures Input Files Chunks 1 2 3 Hadoop: MapReduce / HDFS Parallel Tasks DISK 1 3 Parallel Tasks Parallel Tasks 2 3 3 2 Input Files To persistent storage 1 2 3 Spark / Tachyon 1 Parallel Tasks MEM 2 Parallel Tasks 3 Parallel Tasks Input Files User Partitioning 1 2 SAP Hana 1 Local Tasks MEM 2 Local Tasks Ext Storage Logs & savepoints Lineage: Record of transformations that created an RDD from its “parent” 3-fold Replication 2 1 1 Lineage Master Slave(s) Standby & checkpoints Partitions (RDDs)
  • 21. 21 © 2015 G2M COMMUNICATIONS. All rights reserved. No such thing as 100% In-Memory a b c Input Files Chunks 1 2 3 Hadoop: MapReduce / HDFS 1 Parallel Tasks DISK* 2 3 SSD RAM_ DISK a 2 Parallel Tasks a 3 Parallel Tasks a 1 3 2 1 a b c Input Files Partitions (RDDs) 1 2 3 Spark / Tachyon 1 Parallel Tasks HDD SSD MEM a 2 Parallel Tasks a 3 Parallel Tasks a HDFS2.0 Heterogeneous Storage Storage Types & Policies Files/directories assigned policies (e.g. Lazy_persist, All_SSD) Tachyon Tiered Storage (for Off_heap Spark RDDs) Auto or manual a b c * ARCHIVE tier not shown a b cb a ac c b a b Input Files User Partitioning 1 2 SAP Hana 1 Local Tasks MEM 2 Local Tasks Ext Storage a b Logs & savepoints Caching WARM: Primary image on Disk HOT: Primary image in Mem SAP HANA Dynamic Tiering Data spec’d as either Hot or Warm
  • 22. 22 © 2015 G2M COMMUNICATIONS. All rights reserved. Customer In-Memory Computing Trends (based on G2M survey) • Cluster sizes similar to big data solutions o ½ respondents > 500 servers, 1/3 at >50 o And not just for Spark • With datasets that fit available DRAM capacity o 1/3 at >100TB, 1/3 at >10TB ~Half with 10-20%+/yr dataset growth Majority use/want tier-ing when dataset > DRAM Only minority would rely on scale- out only Mixed on whether tier-ing should be transparent or not Some want it transparent to developer; Rest want developer to have control via policy • ~Half believe “my storage capacity forces me to have more compute capacity then I need” • Majority have or have plans for consolidated data silos o OLTP+IMDB, Spark+Hadoop, NoSQL+Hadoop SIZE GROWTH EFFICIENCY
  • 23. 23 © 2015 G2M COMMUNICATIONS. All rights reserved. Emerging Technologies: High-speed Fabrics & Disaggregated Storage  Ethernet or PCIe based fabric  DAS-like performance Local or SAN  Map any drive to any host  Scale each storage tier separately from compute  Early proof points: EMC DSSD, SanDisk InfiniFlash, DriveScaleHDD NAND DRAM Low latency fabric CPU CPU CPU CPU CPU CPU 50G 40G 25G 10G 100G …  Data Center Ethernet speeds ramping faster than drive speeds: 10/25/40/50/100G  RDMA-over-Ethernet technologies  Multi-host PCIe fabrics emerging (e.g. OCP Lightning) albeit w/ less scalability SATA/SAS NVMe PCIeX4 Gen3 time
  • 24. 24 © 2015 G2M COMMUNICATIONS. All rights reserved. Emerging Technologies: Storage Class Memory Storage Persist- ence Time to access data Price / GB Cost for 100TB # 2U Servers Req’d to Hold 100TB* DRAM N 10ns+ $5.60 $560,000 3125 x 32G DIMMs 130 NV-DIMM -N Y 10ns+ $10+ If 2X+ DRAM $1,000,000+ 260 16G NVDIMM, supercap 3DXP DIMM 100ns Rd 500ns Wr $2+ If 1/3+ DRAM $190,000+ ~50 assuming 96 or 128GB DIMMs NAND Y 100 usec $0.35 2.5” 1TB SSD, $350 ea, Intel 540S $35,000 100 x 2.5” 1TB SSD 5 HDD Y 10,000 usec $0.03 3.5” 4TB SATA HDD for $120 ea, Seagate ST4000DM000 $3,000 25 x 3.5” 4TB SATA HDD 2-3 * Assuming 24 DIMM slots, 24x 2.5” drives or 12x 3.5” drives
  • 25. 25 © 2015 G2M COMMUNICATIONS. All rights reserved. In-Memory Computing Predictions / Trends 1. 3DXP DIMMs used for “Jumbo Memory” – value in lower $/GB vs DRAM, not persistence – Mix of 3DXP & DRAM DIMMs in server nodes – Tier-ing will be tuned to accommodate slower writes & reads – Spark, In-mem Hadoop, MemSQL, Hana, etc – NV-DIMM –P might have similar adoption but predictable latency is a concern 2. Increasing use of NVMe SSDs as “Far Memory” – as next tier (below DRAM/3DXP) – Priority on $/TB, not persistence. Resiliency still via Lineage, logs, etc – Remove ”last-inch” of latency via BLKB (block-layer/kernel bypass) stacks (e.g. EMC libflood, SPDK) – Implemented as a fabric-disaggregated cluster to enable efficiency & independent scalability – Longer-term, HW-based paging of near-memory to far-memory 3. Use of “Persistent Memory” for In-Mem computing will evolve – For 3DXP & NV-DIMM –N – Industry progress on pmem file systems (Linux, Windows) – Does persistence replace or complement lineage/logs? – Need low latency replication across nodes (PMoF)
  • 26. 26 © 2015 G2M COMMUNICATIONS. All rights reserved. Summary  In-memory solutions growing in adoption – driven by real-time analytics  Co-existence of structured (e.g. Hana) and unstructured frameworks (e.g. Spark)  Confluence of big-data & real-time analytics drives increasing adoption of tier-ing  Newer technologies on horizon will continue to create disruptions to in-memory computing architectures

Editor's Notes

  • #13: Clean up graphics. Double check categories
  • #14: Doublecheck app examples
  • #15: Double check category names. And examples
  • #20: Need example application applies to all three Double check Hana master/slave
  • #21: Check storage of lineage