SlideShare a Scribd company logo
Hybrid Transaction/Analytical Processing:
Beyond the Big Database Hype
Ali Hodroj
Vice President, Products and Strategy
Agenda
• Drivers for HTAP
• Emergence of insight-driven
transformation
• GigaSpaces Solution for HTAP
• Reference Architecture and Case Studies
About GigaSpaces
GigaSpaces provides Cloud native In-Memory
Compute middleware for mission-critical
applications.
GigaSpaces IMC serves more than 500 large
enterprises & ISVs, over 50 of which are
Fortune-listed.
Direct customers
300+
Fortune / Organizations
50+ / 500+
Large installations in
production (OEM)
5,000+
ISVs
25+
Direct customers
300+
Fortune / Organizations
50+ / 500+
Large installations in
production (OEM)
5,000+
ISVs
25+
Direct customers
300+
Fortune / Organizations
50+ / 500+
Large installations in
production (OEM)
5,000+
ISVs
25+
Why Hybrid
Transactional
Analytics
Processing?
$13.01 forevery$1
a company spends on analytics, it
gets back spend on data
management and analytics
Source: MIT Sloan, NucleusResearch
The economic value of insight-driven transformation
74%of firms say they want to be data-
driven, but only 23%are successful
Source: Forbes: Actionable Insight: Missing Link between Data and Value
2x [companies are twice] likely to
outperform their peers if they use
advanced analytics
Source: MIT Sloan
Data &
Transactions
Created
Extract, Transform,
Load
BusinessValue
Time toAct
Positive
Negative
Run Analytics
Stale Insights
Decision Made
Outdated Decisions
Trigger Action
Irrelevant
actions
Fast Data Analytics = Immediate Business Value
Data is generated in real-time, while analytics and insight fall behind
Batch Machine Learning & Event ProcessingStreaming
Hours Minutes Seconds Sub-Second Milliseconds
PredictiveSearchandUserInterfacesReal-timePricingHyperlocalAdvertisingRevenue,Customer
Segmentation
ProductRecommendations
Insight-centric systems demand hyperscale analytics
(Case study: intelligent omni-channel commerce)
Microseconds
In-Memory Computing enables HTAP
Clearing the hype:
HTAP and the big
(database)
monolith
Evolution of big databases towards HTAP
Traditional
Relational Database
In-Memory or MPP
Database
• Query engine for either transactional
OR analytics workloads
• Single storage engine
• Vertically Scalable
• Single Query engine for both workloads
• Multiple storage engines (Row-based and
Column-based)
• Leverages memory to speed up I/O
(Traditional) (HTAP)
Yet analytics evolved much faster
Insight-driven transformation requires:
• Applications with polyglot persistence
(microservices, multiple data sources)
• Analytics are mostly real-time,
streaming, and predictive
• Iterative data science – modeling against
live data for continuous machine and
deep learning
High
Low
Past FutureTime Horizon
BusinessValue
Business
Intelligence
Data Science
Prescriptive analytics
Predictive analytics
(What will happen?
What should I do?)
Historical reporting
(What happened?)
(HTAP)
The Open Source
and In-Memory
Insight Platform
Approach
HTAP = Spark + In-Memory Data Grid
Large-scale distributed
analytics framework
Unified, scale-out, low-latency data store
Transactional capabilities:
ACID, Event-Driven, Rich Data
modeling
Microservices
16
Elastic Scale-out In-Memory Storage
(Shared-nothing, Linear scalability, Elastic capacity)
Low latency and high throughput
(co-located ops, event-driven, fast indexing)
High availability and Resiliency
(auto-healing, multi-data center replication, fault tolerance)
Rich API and Query Language
(SQL, Spring, Java, .NET, C++)
GigaSpaces XAP In-Memory Data Grid
17
Geo-Spatial Full Text
In-Memory Data Grid + Spark Convergence
19
• Unified & Concise API
• Highly Flexible Data Store Integration
• Massive Community and Adoption
Why Spark?
Why In-Memory Data Grid?
SQL-99, Polyglot
Data & Search
Multi-Tiered Data
Storage
Cloud-Nativeand Horizontally
Scalable
• RAM
• SSD/Flash
• Storage-Class Memory
(3DXPoint)
• SQL ‘99
• Graph
• JSON
• POJO
• GeoSpatial
• Full Text
Distributed In-Grid Analytics
• SQL
• Streaming
• Machine Learning
• Graph Processing
• Deep Learning
• Textmining
• Geospatial
• In-Memory Event-Driven
Processing
• Distributed Tasks and Compute
Grid
• Real-time Web Services
• In-Memory Aggregations
Advanced In-Grid Transactions and Analytics Processing
GigaSpaces
Hadoop
Embracing an open source analytics ecosystem
Pick your own fast data architecture (lambda, kappa) and co-locate transaction processing
Kafka
Spark
Simplified Lambda Architecture
(Realtime + Historical)
Reference
Architecture
Unified HTAP Architecture
node 1
Spark master
Grid
master
node 2
Spark worker
Grid
Partition
node 3
Spark worker
Grid
Partition
Lightweight
workers,
small JVMs
Large JVMs,
Fast
indexing
• Push-down predicates (ultra-low latency processing,
30x performance improvement)
• Stateful data-360 sharing across analytics jobs
• Data-locality for high throughput
• Five 9s High Availability
Decoupled HTAP Architecture
In-Memory Data Grid
Realtime Replication
• Scoring models
• Trigger actions
• Events
Transactions Analytics
• Useful when analytics are
mostly batch or long-
running queries.
• Analytics grid can be used
for frequent model training
(CPU intensive), without
impacting transactional
apps
• Flexibility in write-heavy
(transactions) and read-
heavy (analytics)
independent scaling Application
developers
Data Scientists &
Analysts
Case Studies
Case Study: Magic Software
IoT Hub + Predictive Analytics (Automotive Telematics)
Challenge:
• Implement predictive analytics and anomaly detection
• Expand insight context through customer/data-360
integration
• Trigger transactional workflows based on prediction criteria
Solution:
• Simplified HTAP with Streaming data pipeline (3 tiers)
• IoT streaming analytics with 9s high availability
“GigaSpaces enables our
customers to simplify and
accelerate telemetry
ingestion, to gain full
business value from IoT
adoption.”
Yuval Lavi, VP of Innovation
Magic Software
http://www.magicsoftware.com
Key Takeaways
By the end of this presentation, you hopefully understood that:
➔ HTAP is not just a database problem!
Capturing business value from real-time apps requires more than a hybrid
database. Look into distributed analytics frameworks for speed of
innovation
➔ Hyperscale analytics require the combination of several tools
Open source analytics provide better long term ROI for implementing both
BI analytics and Data Science, while reducing architecture complexity.
➔ Try it all out – It’s open source!
http://insightedge.io / http://gigaspaces.com
http://github.com/InsightEdge
http://insightedge.slack.com
hello@insightedge.io
Book a demo:
Q&A

More Related Content

What's hot (20)

PPTX
Big Data in the Cloud with Azure Marketplace Images
Mark Kromer
 
PPTX
Building big data solutions on azure
Eyal Ben Ivri
 
PDF
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dataconomy Media
 
PPTX
Intuit Analytics Cloud 101
DataWorks Summit/Hadoop Summit
 
PDF
Transforming GE Healthcare with Data Platform Strategy
Databricks
 
PDF
Unlocking Geospatial Analytics Use Cases with CARTO and Databricks
Databricks
 
PPTX
Azure cafe marketplace with looker data analytics
Mark Kromer
 
PDF
VP of WW Partners by Alan Chhabra
Big Data Spain
 
PDF
Cortana Analytics Workshop: Operationalizing Your End-to-End Analytics Solution
MSAdvAnalytics
 
PPTX
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
Infochimps, a CSC Big Data Business
 
PDF
Snowflakes in the Cloud Real world experience on a new approach for Big Data
DevFest DC
 
PPTX
Pouring the Foundation: Data Management in the Energy Industry
DataWorks Summit
 
PDF
Finding the needle in the haystack: how Nestle is leveraging big data to defe...
Big Data Spain
 
PPTX
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
DataStax
 
PDF
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Databricks
 
PDF
Advanced data science algorithms applied to scalable stream processing by Dav...
Big Data Spain
 
PDF
Graph Hardware Architecture - Enterprise graphs deserve great hardware!
TigerGraph
 
PDF
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Dipti Borkar
 
PPTX
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Dataconomy Media
 
PDF
IBM Cloud Native Day April 2021: Serverless Data Lake
Torsten Steinbach
 
Big Data in the Cloud with Azure Marketplace Images
Mark Kromer
 
Building big data solutions on azure
Eyal Ben Ivri
 
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dataconomy Media
 
Intuit Analytics Cloud 101
DataWorks Summit/Hadoop Summit
 
Transforming GE Healthcare with Data Platform Strategy
Databricks
 
Unlocking Geospatial Analytics Use Cases with CARTO and Databricks
Databricks
 
Azure cafe marketplace with looker data analytics
Mark Kromer
 
VP of WW Partners by Alan Chhabra
Big Data Spain
 
Cortana Analytics Workshop: Operationalizing Your End-to-End Analytics Solution
MSAdvAnalytics
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
Infochimps, a CSC Big Data Business
 
Snowflakes in the Cloud Real world experience on a new approach for Big Data
DevFest DC
 
Pouring the Foundation: Data Management in the Energy Industry
DataWorks Summit
 
Finding the needle in the haystack: how Nestle is leveraging big data to defe...
Big Data Spain
 
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
DataStax
 
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Databricks
 
Advanced data science algorithms applied to scalable stream processing by Dav...
Big Data Spain
 
Graph Hardware Architecture - Enterprise graphs deserve great hardware!
TigerGraph
 
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Dipti Borkar
 
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Dataconomy Media
 
IBM Cloud Native Day April 2021: Serverless Data Lake
Torsten Steinbach
 

Similar to Hybrid Transactional/Analytics Processing: Beyond the Big Database Hype (20)

PDF
ManMachine&Mathematics_Arup_Ray_Ext
Arup Ray
 
PDF
Smarter Analytics: Supporting the Enterprise with Automation
Inside Analysis
 
PDF
A technical Introduction to Big Data Analytics
Pethuru Raj PhD
 
PPTX
Five ways database modernization simplifies your data life
SingleStore
 
PDF
Integrating Structure and Analytics with Unstructured Data
DATAVERSITY
 
PDF
Analytics in a Day Ft. Synapse Virtual Workshop
CCG
 
PDF
The Keys to Digital Transformation
MapR Technologies
 
PPTX
Assessing New Databases– Translytical Use Cases
DATAVERSITY
 
PPTX
TCS Point of View Session - Analyze by Dr. Gautam Shroff, VP and Chief Scient...
Tata Consultancy Services
 
PDF
Exploring the Wider World of Big Data
NetApp
 
PDF
NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation
NRB
 
PDF
NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation
NRB
 
PPTX
Finding business value in Big Data
James Serra
 
PDF
Next Generation Data Platforms - Deon Thomas
Thoughtworks
 
PPTX
Microsoft cloud big data strategy
James Serra
 
PDF
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Cynthia Saracco
 
PPTX
New BI and IMC
Sandeep Kumar, PMP®
 
PDF
Big data and you
IBM
 
PPTX
Cisco event 6 05 2014v3 wwt only
Arthur_Hansen
 
PDF
Exploring the Wider World of Big Data- Vasalis Kapsalis
NetAppUK
 
ManMachine&Mathematics_Arup_Ray_Ext
Arup Ray
 
Smarter Analytics: Supporting the Enterprise with Automation
Inside Analysis
 
A technical Introduction to Big Data Analytics
Pethuru Raj PhD
 
Five ways database modernization simplifies your data life
SingleStore
 
Integrating Structure and Analytics with Unstructured Data
DATAVERSITY
 
Analytics in a Day Ft. Synapse Virtual Workshop
CCG
 
The Keys to Digital Transformation
MapR Technologies
 
Assessing New Databases– Translytical Use Cases
DATAVERSITY
 
TCS Point of View Session - Analyze by Dr. Gautam Shroff, VP and Chief Scient...
Tata Consultancy Services
 
Exploring the Wider World of Big Data
NetApp
 
NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation
NRB
 
NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation
NRB
 
Finding business value in Big Data
James Serra
 
Next Generation Data Platforms - Deon Thomas
Thoughtworks
 
Microsoft cloud big data strategy
James Serra
 
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Cynthia Saracco
 
New BI and IMC
Sandeep Kumar, PMP®
 
Big data and you
IBM
 
Cisco event 6 05 2014v3 wwt only
Arthur_Hansen
 
Exploring the Wider World of Big Data- Vasalis Kapsalis
NetAppUK
 
Ad

Recently uploaded (20)

PDF
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
PDF
Driver Easy Pro 6.1.1 Crack Licensce key 2025 FREE
utfefguu
 
PDF
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
PPTX
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
PDF
Empower Your Tech Vision- Why Businesses Prefer to Hire Remote Developers fro...
logixshapers59
 
PDF
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
PDF
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
PPTX
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
PDF
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
PDF
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
PPTX
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PPTX
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
PPTX
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
PPTX
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
PDF
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
PDF
TheFutureIsDynamic-BoxLang witch Luis Majano.pdf
Ortus Solutions, Corp
 
PDF
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
PDF
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
PDF
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
PPTX
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
Driver Easy Pro 6.1.1 Crack Licensce key 2025 FREE
utfefguu
 
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
Empower Your Tech Vision- Why Businesses Prefer to Hire Remote Developers fro...
logixshapers59
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
TheFutureIsDynamic-BoxLang witch Luis Majano.pdf
Ortus Solutions, Corp
 
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
Ad

Hybrid Transactional/Analytics Processing: Beyond the Big Database Hype

  • 1. Hybrid Transaction/Analytical Processing: Beyond the Big Database Hype Ali Hodroj Vice President, Products and Strategy
  • 2. Agenda • Drivers for HTAP • Emergence of insight-driven transformation • GigaSpaces Solution for HTAP • Reference Architecture and Case Studies
  • 3. About GigaSpaces GigaSpaces provides Cloud native In-Memory Compute middleware for mission-critical applications. GigaSpaces IMC serves more than 500 large enterprises & ISVs, over 50 of which are Fortune-listed. Direct customers 300+ Fortune / Organizations 50+ / 500+ Large installations in production (OEM) 5,000+ ISVs 25+
  • 4. Direct customers 300+ Fortune / Organizations 50+ / 500+ Large installations in production (OEM) 5,000+ ISVs 25+
  • 5. Direct customers 300+ Fortune / Organizations 50+ / 500+ Large installations in production (OEM) 5,000+ ISVs 25+
  • 7. $13.01 forevery$1 a company spends on analytics, it gets back spend on data management and analytics Source: MIT Sloan, NucleusResearch The economic value of insight-driven transformation 74%of firms say they want to be data- driven, but only 23%are successful Source: Forbes: Actionable Insight: Missing Link between Data and Value 2x [companies are twice] likely to outperform their peers if they use advanced analytics Source: MIT Sloan
  • 8. Data & Transactions Created Extract, Transform, Load BusinessValue Time toAct Positive Negative Run Analytics Stale Insights Decision Made Outdated Decisions Trigger Action Irrelevant actions Fast Data Analytics = Immediate Business Value Data is generated in real-time, while analytics and insight fall behind
  • 9. Batch Machine Learning & Event ProcessingStreaming Hours Minutes Seconds Sub-Second Milliseconds PredictiveSearchandUserInterfacesReal-timePricingHyperlocalAdvertisingRevenue,Customer Segmentation ProductRecommendations Insight-centric systems demand hyperscale analytics (Case study: intelligent omni-channel commerce) Microseconds
  • 11. Clearing the hype: HTAP and the big (database) monolith
  • 12. Evolution of big databases towards HTAP Traditional Relational Database In-Memory or MPP Database • Query engine for either transactional OR analytics workloads • Single storage engine • Vertically Scalable • Single Query engine for both workloads • Multiple storage engines (Row-based and Column-based) • Leverages memory to speed up I/O (Traditional) (HTAP)
  • 13. Yet analytics evolved much faster Insight-driven transformation requires: • Applications with polyglot persistence (microservices, multiple data sources) • Analytics are mostly real-time, streaming, and predictive • Iterative data science – modeling against live data for continuous machine and deep learning High Low Past FutureTime Horizon BusinessValue Business Intelligence Data Science Prescriptive analytics Predictive analytics (What will happen? What should I do?) Historical reporting (What happened?) (HTAP)
  • 14. The Open Source and In-Memory Insight Platform Approach
  • 15. HTAP = Spark + In-Memory Data Grid Large-scale distributed analytics framework Unified, scale-out, low-latency data store Transactional capabilities: ACID, Event-Driven, Rich Data modeling Microservices
  • 16. 16 Elastic Scale-out In-Memory Storage (Shared-nothing, Linear scalability, Elastic capacity) Low latency and high throughput (co-located ops, event-driven, fast indexing) High availability and Resiliency (auto-healing, multi-data center replication, fault tolerance) Rich API and Query Language (SQL, Spring, Java, .NET, C++) GigaSpaces XAP In-Memory Data Grid
  • 17. 17
  • 18. Geo-Spatial Full Text In-Memory Data Grid + Spark Convergence
  • 19. 19 • Unified & Concise API • Highly Flexible Data Store Integration • Massive Community and Adoption Why Spark?
  • 20. Why In-Memory Data Grid? SQL-99, Polyglot Data & Search Multi-Tiered Data Storage Cloud-Nativeand Horizontally Scalable • RAM • SSD/Flash • Storage-Class Memory (3DXPoint) • SQL ‘99 • Graph • JSON • POJO • GeoSpatial • Full Text Distributed In-Grid Analytics • SQL • Streaming • Machine Learning • Graph Processing • Deep Learning • Textmining • Geospatial • In-Memory Event-Driven Processing • Distributed Tasks and Compute Grid • Real-time Web Services • In-Memory Aggregations Advanced In-Grid Transactions and Analytics Processing
  • 21. GigaSpaces Hadoop Embracing an open source analytics ecosystem Pick your own fast data architecture (lambda, kappa) and co-locate transaction processing Kafka Spark Simplified Lambda Architecture (Realtime + Historical)
  • 23. Unified HTAP Architecture node 1 Spark master Grid master node 2 Spark worker Grid Partition node 3 Spark worker Grid Partition Lightweight workers, small JVMs Large JVMs, Fast indexing • Push-down predicates (ultra-low latency processing, 30x performance improvement) • Stateful data-360 sharing across analytics jobs • Data-locality for high throughput • Five 9s High Availability
  • 24. Decoupled HTAP Architecture In-Memory Data Grid Realtime Replication • Scoring models • Trigger actions • Events Transactions Analytics • Useful when analytics are mostly batch or long- running queries. • Analytics grid can be used for frequent model training (CPU intensive), without impacting transactional apps • Flexibility in write-heavy (transactions) and read- heavy (analytics) independent scaling Application developers Data Scientists & Analysts
  • 26. Case Study: Magic Software IoT Hub + Predictive Analytics (Automotive Telematics) Challenge: • Implement predictive analytics and anomaly detection • Expand insight context through customer/data-360 integration • Trigger transactional workflows based on prediction criteria Solution: • Simplified HTAP with Streaming data pipeline (3 tiers) • IoT streaming analytics with 9s high availability “GigaSpaces enables our customers to simplify and accelerate telemetry ingestion, to gain full business value from IoT adoption.” Yuval Lavi, VP of Innovation Magic Software http://www.magicsoftware.com
  • 27. Key Takeaways By the end of this presentation, you hopefully understood that: ➔ HTAP is not just a database problem! Capturing business value from real-time apps requires more than a hybrid database. Look into distributed analytics frameworks for speed of innovation ➔ Hyperscale analytics require the combination of several tools Open source analytics provide better long term ROI for implementing both BI analytics and Data Science, while reducing architecture complexity. ➔ Try it all out – It’s open source! http://insightedge.io / http://gigaspaces.com http://github.com/InsightEdge http://insightedge.slack.com [email protected] Book a demo:
  • 28. Q&A

Editor's Notes

  • #8: We’re talking today about HTAP, and analytics in general, because the economic value of insight-driven transformation is undeniable Recent research shows really interesting numbers for what you might call insight-driven businesses From an ROI perspective, firms are seeing a %1300 ROI The majority of those who haven’t become fully insight-driven, about 74%, already have plans for introducing analytics at every corner for their business This is mainly due to the recognition that, having analytics, not only as means of differentiation, but as a fast innovation engine, to be twice as innovate and ourperform their peers.
  • #9: Which brings us to the business value of analytics:…. Recent years have seen the need for more real-time analytics. In addition, mobile and IoT have given rise to a new generation of applications that are characterized by heavy ingest rates, i.e. they produce large amounts of data in a short time, as well as their need for more realtime analysis. Enterprises are pushing for more real-time analysis of their data to drive competitive advantage, and as such they need the ability to run analytics on their operational data as soon as possible. In order to become truly insight driven and innovate like amazon, this requires a departure from traditional analytics infrastructures.
  • #10: Speaking of Amazon, one interesting use case we see quite often in retail is the ability to become an omni-channel retailer. Which requires what we call “hyperscale analytics” Let’s take a look
  • #11: NOW FORTUNATELY, there has been advances in distributed computing that help us realize this vision. Thanks to the declining price of RAM and advancements in SSD storage, in-memory computing is becoming mainstrema. For those not familiar, in-memory computing means using RAM as the primary storage medium for business and analytics. There by eliminating any form of Disk I/O or network I/O latency, therefore operating at millisecond latencies at very high throughput. To understand HTAP, we first need to look into OLTP and OLAP systems and how they progressed over the years. Relational databases have been used for both transaction processing as well as analytics. However, OLTP and OLAP systems have very different characteristics. OLTP systems are identified by their individual record insert/delete/update statements, as well as point queries that benefit from indexes. One cannot think about OLTP systems without indexing support. OLAP systems, on the other hand, are updated in batches and usually require scans of the tables. Batch insertion into OLAP systems are an artifact of ETL (extract transform load) systems that consolidate and transform transactional data from OLTP systems into an OLAP environment for analysis.
  • #12: If you read Gartner’s report on HTAP, you’ll see that most are actually classic database vendors. Now what does it mean to have an HTAP architecture? We see quite a lot people fall into the trap of thinking about HTAP as a acquiring a large vertically scalable database (like SAP HANA, Oracle) or others.
  • #13: To understand HTAP, we first need to look into how databases evolved from the traditional OLTP vs OLAP world to the modern HTAP. As HTAP and realtime analytics became a necessity, we started seeing database vendors go outside their swim-lanes to introduce built-in LRU caching mechanisms (using In-Memory).
  • #14: The reality of insight-driven transformation is that it requires a wide scope of analytics HTAP databases are simply focused on BI type of workloads (reporting queries)
  • #15: At the same time, the last decade seen an explosion of many big data and in-memory computing technologies, driven by new generation applications. NoSQL or key-value stores, such as Voldemort, Cassandra, RocksDB, offer fast inserts and lookups, and very high scale out, but lack in their query capabilities, and offer only loose transactional guarantees (see Mohan’s tutorial[25]). There have been also many SQL-on-Hadoop offerings, including Hive [36], Big SQL[15], Impala[20], and Spark SQL[3], that provide analytics capabilities over large data sets, focusing on OLAP queries only, and lacking transaction support. Although all these systems support queries over text and CSV files, their focus have been on columnar storage formats
  • #16: HTAP solutions today follow a variety of design practices. Now one of the major design decisions HTAP systems have to make is whether or not to use the same engine for both OLTP and OLAP requests. One approach is to decouple OLTP and an OLAP systems together for HTAP. It is up to the applications to maintain the hybrid architecture. The operational data in the OLTP system are aged to the OLAP system using standard ETL process. In fact, this is very common in the big data world, where applications use a fast key-value store like Cassandra for transactional workloads, and the operational data are groomed into Parquet or ORC files on HDFS for a SQL-on-Hadoop system for queries. BUT as a result, there is a lag between what data the OLAP system can query and what data the OLTP system sees.
  • #20: all common API tap into other data stores on demand Data science is in high demand, but short supply – so the ability to leverage the know how, capability, and production readiness eliminates a lot of pain points.
  • #21: First reason is speed: “In memory computing (IMC) … provides transformational opportunities. The execution of certain-types of hours-long batch processes can be squeezed into minutes or even seconds …Millions of events can be scanned in a matter of a few tens of millisecond to detect correlations and patterns pointing at emerging opportunities and threats "as things happen.” Besides that, in-memory data grids are proving to be a very mature containers for real-time application. While they started in finance and
  • #25: Goal is to provide a unified environment where application developers and data scientists can collaborate. Data science by itself is an iterative activity which requires a lot of trial and error