SlideShare a Scribd company logo
© 10x Banking Technology Limited 2022. All rights reserved
Stuart Coleman – 10x Banking
Enabling product
personalization using Kafka,
Pinot and Trino
05 April 2022
© 10x Banking Technology Limited 2022. All rights reserved
1. Why your bank has not offered you new products for a long time.
2. Breaking the monolith of core banking systems and liberating data from the
core.
3. Taking advantage of the hard work to offer new and dynamic product
personalization.
4. Our experiences are based on transactions and accounts (but you should be
able to substitutes purchases and customers or movies and users)
What are we going to talk about?
What are we not going to talk about?
1. Lots of details of Kafka, Pinot or Trino architecture
© 10x Banking Technology Limited 2022. All rights reserved
What is core banking and how does it work?
© 10x Banking Technology Limited 2022. All rights reserved
Core banking is conceptually easy:
• Customers onboard to the bank and subscribe to
one or more products.
• They make and receive payments.
• Payments are booked into the Ledger
synchronously in real time (to avoid double
spend).
• Products define a series of lifecycle steps which
happen after payments are posted
• Interest calculation
• Fees and rewards
• Reporting and Accounting
How does my bank work?
© 10x Banking Technology Limited 2022. All rights reserved
How does my bank work?
• Correctness is absolute – it’s a bank!
• Status quo is to use a mainframe
• Highly performant, available and
reliable.
• Consistency is much easier in a
monolith
• Applications directly communicate with
mainframe
• Single monolithic shared databases
Courtesy of https://docs.microsoft.com/en-us/azure/architecture/example-
scenario/mainframe/ibm-zos-online-transaction-processing-azure
© 10x Banking Technology Limited 2022. All rights reserved
Domain driven design, microservices and data encapsulation
© 10x Banking Technology Limited 2022. All rights reserved
Retrieving balances and adding cleansed merchant name to a transaction
© 10x Banking Technology Limited 2022. All rights reserved
Retrieving balances and adding cleansed merchant name to a transaction – mainframe version
Courtesy of https://docs.microsoft.com/en-us/azure/architecture/example-
scenario/mainframe/ibm-zos-online-transaction-processing-azure
• No new components
• Front end and business logic
components need to be modified
• Required new data fields added to
monolithic data layer
• Complex and risky change
© 10x Banking Technology Limited 2022. All rights reserved
Data Dichotomy
Taken from https://www.confluent.io/en-gb/blog/data-dichotomy-rethinking-the-way-we-treat-data-and-services/
© 10x Banking Technology Limited 2022. All rights reserved
Data Dichotomy
© 10x Banking Technology Limited 2022. All rights reserved
Event Driven Design
© 10x Banking Technology Limited 2022. All rights reserved
Event Driven Design
© 10x Banking Technology Limited 2022. All rights reserved
Correctness and integrity baked in – the Outbox pattern
© 10x Banking Technology Limited 2022. All rights reserved
Putting core banking data to use for customers
© 10x Banking Technology Limited 2022. All rights reserved
Your bank account today
© 10x Banking Technology Limited 2022. All rights reserved
Let’s build some new products
© 10x Banking Technology Limited 2022. All rights reserved
What dimensions are interesting?
© 10x Banking Technology Limited 2022. All rights reserved
What we need
Ability to compute analytical aggregates on data with filters from data in other domains
© 10x Banking Technology Limited 2022. All rights reserved
Possible solutions – data warehouse + real time component
© 10x Banking Technology Limited 2022. All rights reserved
Possible solutions – data warehouse + real time component
© 10x Banking Technology Limited 2022. All rights reserved
Pre-aggregation for reliable realtime query latency
Reliable query latency but
• Dimensions need to be known beforehand
• One record generates multiple aggregates
• Dimension and storage explosion
• Difficult to scale
© 10x Banking Technology Limited 2022. All rights reserved
Flexibility vs Latency
© 10x Banking Technology Limited 2022. All rights reserved
Pinot and Trino in 1 minute
Pinot is a purpose built data store for ultra-low latency analytics at high throughput
• Column oriented
• Powerful indexing techniques for low latency aggregation and filtering
• Horizontally scalable
• Supports high concurrency queries
Trino is a distributed ANSI SQL compliant engine
• Pluggable connector architecture which allows querying across many data stores,
(including Pinot)
• Powerful indexing techniques for low latency aggregation and filtering
• Built for low latency and efficiency even on large batch queries
© 10x Banking Technology Limited 2022. All rights reserved
Bridging the gap with Pinot and Trino
© 10x Banking Technology Limited 2022. All rights reserved
Handling both types of queries with Pinot and Trino
• Single copy of data
• No need to handle two ingest pipelines
• Scaleable horizontally through more Pinot servers and Trino workers
© 10x Banking Technology Limited 2022. All rights reserved
How much to denormalize?
• Pinot does have limited support for lookup
joins but not fully featured SQL joins
• Aggregation, filtering and grouping are strong
with a wide range of indexes to speed up
queries
• For our use case, most practical to pre-join
outside of Pinot and to ingest the pre-joined
topic. Aggregations and group-by’s performed
in Trino and Pinot depending on query size
© 10x Banking Technology Limited 2022. All rights reserved
Ensuring correctness
© 10x Banking Technology Limited 2022. All rights reserved
Deduplication
• Duplication is (obviously) not acceptable in core banking
• Outbox only guarantees at least once delivery
• Message only marked as sent after publication to Kafka, outside of a transaction
• Pinot ingestion is exactly once but has no inbuilt deduplication in the ingestion component
© 10x Banking Technology Limited 2022. All rights reserved
Pinot Upsert and how Pinot consumes in real time
• Low level consumer ingestion has one consumer per topic
partition
• This is duplicated by the replication factor for Pinot
• Pinot upsert requires primary key for upsert is placed on
the same partition
• This can then be checked for duplicate records on that
server
• Gives an efficient way of deduplication without a global
coordinator
But there are some cons:
• Duplicates are only checked in a given time window
• Increasing partitions in Kafka is problematic
• Read consistency is not guaranteed
• Certain Pinot indexes cannot be used (startree)
© 10x Banking Technology Limited 2022. All rights reserved
Deduplication - Subquery
• Safest to deduplicate on the read side
© 10x Banking Technology Limited 2022. All rights reserved
Deduplication – Subquery in Pinot
• Pinot wraps double query in a single query using IdSets
© 10x Banking Technology Limited 2022. All rights reserved
Takeaways
Domain driven design and microservices have lots of great benefits. Processes become less
coupled and product innovation can happen in a safer and more flexible way.
But data can become trapped inside domains and building features which require data across
multiple domains, like product personalization, become difficult and entangled.
Event based architectures are a great way to share data across domains, allowing datasets to
be joined.
Apache Pinot provides the ability to perform real time customer facing analytics and
personalisation without the dimension explosion typical in pre-aggregation solutions in stream
processing.
Thank you

More Related Content

What's hot (20)

PPTX
A visual introduction to Apache Kafka
Paul Brebner
 
PPTX
How Kubernetes scheduler works
Himani Agrawal
 
PDF
VictoriaMetrics 2023 Roadmap
VictoriaMetrics
 
PPTX
Envoy and Kafka
Adam Kotwasinski
 
PDF
Launch your own NFTs with Zero Tech understanding
Hitesh Gossain
 
PDF
API Lifecycle, Part 2: Monitor and Deploy an API
Postman
 
PDF
Timeseries - data visualization in Grafana
OCoderFest
 
PPTX
URP? Excuse You! The Three Kafka Metrics You Need to Know
Todd Palino
 
PPTX
Apache Kafka
emreakis
 
PDF
Go Observability (in practice)
Eran Levy
 
PDF
Data reply sneak peek: real time decision engines
confluent
 
PDF
Kubernetes vs App Service
Lorenzo Barbieri
 
PPTX
Kafka connect 101
Whiteklay
 
PDF
Getting Started with Confluent Schema Registry
confluent
 
PDF
The Observability Pipeline
Tyler Treat
 
PDF
Spark streaming + kafka 0.10
Joan Viladrosa Riera
 
PPTX
Building a Real-time Data Pipeline: Apache Kafka at LinkedIn
DataWorks Summit
 
PDF
Introduction to Google Compute Engine
Colin Su
 
PDF
Istio : Service Mesh
Knoldus Inc.
 
PPTX
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Shivji Kumar Jha
 
A visual introduction to Apache Kafka
Paul Brebner
 
How Kubernetes scheduler works
Himani Agrawal
 
VictoriaMetrics 2023 Roadmap
VictoriaMetrics
 
Envoy and Kafka
Adam Kotwasinski
 
Launch your own NFTs with Zero Tech understanding
Hitesh Gossain
 
API Lifecycle, Part 2: Monitor and Deploy an API
Postman
 
Timeseries - data visualization in Grafana
OCoderFest
 
URP? Excuse You! The Three Kafka Metrics You Need to Know
Todd Palino
 
Apache Kafka
emreakis
 
Go Observability (in practice)
Eran Levy
 
Data reply sneak peek: real time decision engines
confluent
 
Kubernetes vs App Service
Lorenzo Barbieri
 
Kafka connect 101
Whiteklay
 
Getting Started with Confluent Schema Registry
confluent
 
The Observability Pipeline
Tyler Treat
 
Spark streaming + kafka 0.10
Joan Viladrosa Riera
 
Building a Real-time Data Pipeline: Apache Kafka at LinkedIn
DataWorks Summit
 
Introduction to Google Compute Engine
Colin Su
 
Istio : Service Mesh
Knoldus Inc.
 
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Shivji Kumar Jha
 

Similar to Enabling product personalisation using Apache Kafka, Apache Pinot and Trino with Stuart Coleman | Kafka Summit London 2022 (20)

PPTX
eFolder Partner Chat Webinar — Breaking the Chain: Why One Partner Adopted a ...
eFolder
 
PPTX
Open Sourcing GemFire - Apache Geode
Apache Geode
 
PPTX
An Introduction to Apache Geode (incubating)
Anthony Baker
 
PPTX
eFolder Webinar — Features and Facts: Replibit vs. Acronis vs. ShadowProtect
eFolder
 
PPTX
eFolder Partner Chat webinar — Breaking the Chain: Why One Partner Adopted a ...
eFolder
 
PPTX
apidays LIVE LONDON - Old meets New - Managing transactions on the edge of th...
apidays
 
PPTX
Praktische handvatten voor een private cloud implementatie
Proact Netherlands B.V.
 
PDF
How Cloud Providers are Playing with Traditional Data Center
Hostway|HOSTING
 
PDF
Blytheco NetSuite Overview Presentation
Blytheco
 
PPTX
eFolder Partner Chat Webinar — Making the Case for BDR: Communicating the Val...
eFolder
 
PDF
How to Monitor and Observe IoT and MQTT Applications with HiveMQ
HiveMQ
 
PDF
Real World Use Cases and Success Stories for In-Memory Data Grids (TIBCO Acti...
Kai Wähner
 
PPTX
eFolder Partner Chat Webinar — Getting Started on the Right Foot with BDR
eFolder
 
PPT
Gentle into to DataGrid technology and customer use cases
Billy Newport
 
PPTX
Taking your Siemens PLC s7-1200 to industry 4.0
DMC, Inc.
 
PDF
Which One Works You The Best: In-House or Cloud-Based Development Environment
Bitbar
 
PDF
INFINIDAT InfiniGuard - 20220330.pdf
MarketingArrowECS_CZ
 
PPTX
ProfitBricks-cloud-computing-cloudconnect-2012
ProfitBricks
 
PPTX
Profit bricks cloud-computing-cloudconnect-2012
ProfitBricks
 
PPTX
apidays LIVE Paris 2021 - EDI & API on One Integration Platform by Mir Mustha...
apidays
 
eFolder Partner Chat Webinar — Breaking the Chain: Why One Partner Adopted a ...
eFolder
 
Open Sourcing GemFire - Apache Geode
Apache Geode
 
An Introduction to Apache Geode (incubating)
Anthony Baker
 
eFolder Webinar — Features and Facts: Replibit vs. Acronis vs. ShadowProtect
eFolder
 
eFolder Partner Chat webinar — Breaking the Chain: Why One Partner Adopted a ...
eFolder
 
apidays LIVE LONDON - Old meets New - Managing transactions on the edge of th...
apidays
 
Praktische handvatten voor een private cloud implementatie
Proact Netherlands B.V.
 
How Cloud Providers are Playing with Traditional Data Center
Hostway|HOSTING
 
Blytheco NetSuite Overview Presentation
Blytheco
 
eFolder Partner Chat Webinar — Making the Case for BDR: Communicating the Val...
eFolder
 
How to Monitor and Observe IoT and MQTT Applications with HiveMQ
HiveMQ
 
Real World Use Cases and Success Stories for In-Memory Data Grids (TIBCO Acti...
Kai Wähner
 
eFolder Partner Chat Webinar — Getting Started on the Right Foot with BDR
eFolder
 
Gentle into to DataGrid technology and customer use cases
Billy Newport
 
Taking your Siemens PLC s7-1200 to industry 4.0
DMC, Inc.
 
Which One Works You The Best: In-House or Cloud-Based Development Environment
Bitbar
 
INFINIDAT InfiniGuard - 20220330.pdf
MarketingArrowECS_CZ
 
ProfitBricks-cloud-computing-cloudconnect-2012
ProfitBricks
 
Profit bricks cloud-computing-cloudconnect-2012
ProfitBricks
 
apidays LIVE Paris 2021 - EDI & API on One Integration Platform by Mir Mustha...
apidays
 
Ad

More from HostedbyConfluent (20)

PDF
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
PDF
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
PDF
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
PDF
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
PDF
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
PDF
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
PDF
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
PDF
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
PDF
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
PDF
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
PDF
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
PDF
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
PDF
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
PDF
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
PDF
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
PDF
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
PDF
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
PDF
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
PDF
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
PDF
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 
Ad

Recently uploaded (20)

PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 

Enabling product personalisation using Apache Kafka, Apache Pinot and Trino with Stuart Coleman | Kafka Summit London 2022

  • 1. © 10x Banking Technology Limited 2022. All rights reserved Stuart Coleman – 10x Banking Enabling product personalization using Kafka, Pinot and Trino 05 April 2022
  • 2. © 10x Banking Technology Limited 2022. All rights reserved 1. Why your bank has not offered you new products for a long time. 2. Breaking the monolith of core banking systems and liberating data from the core. 3. Taking advantage of the hard work to offer new and dynamic product personalization. 4. Our experiences are based on transactions and accounts (but you should be able to substitutes purchases and customers or movies and users) What are we going to talk about? What are we not going to talk about? 1. Lots of details of Kafka, Pinot or Trino architecture
  • 3. © 10x Banking Technology Limited 2022. All rights reserved What is core banking and how does it work?
  • 4. © 10x Banking Technology Limited 2022. All rights reserved Core banking is conceptually easy: • Customers onboard to the bank and subscribe to one or more products. • They make and receive payments. • Payments are booked into the Ledger synchronously in real time (to avoid double spend). • Products define a series of lifecycle steps which happen after payments are posted • Interest calculation • Fees and rewards • Reporting and Accounting How does my bank work?
  • 5. © 10x Banking Technology Limited 2022. All rights reserved How does my bank work? • Correctness is absolute – it’s a bank! • Status quo is to use a mainframe • Highly performant, available and reliable. • Consistency is much easier in a monolith • Applications directly communicate with mainframe • Single monolithic shared databases Courtesy of https://docs.microsoft.com/en-us/azure/architecture/example- scenario/mainframe/ibm-zos-online-transaction-processing-azure
  • 6. © 10x Banking Technology Limited 2022. All rights reserved Domain driven design, microservices and data encapsulation
  • 7. © 10x Banking Technology Limited 2022. All rights reserved Retrieving balances and adding cleansed merchant name to a transaction
  • 8. © 10x Banking Technology Limited 2022. All rights reserved Retrieving balances and adding cleansed merchant name to a transaction – mainframe version Courtesy of https://docs.microsoft.com/en-us/azure/architecture/example- scenario/mainframe/ibm-zos-online-transaction-processing-azure • No new components • Front end and business logic components need to be modified • Required new data fields added to monolithic data layer • Complex and risky change
  • 9. © 10x Banking Technology Limited 2022. All rights reserved Data Dichotomy Taken from https://www.confluent.io/en-gb/blog/data-dichotomy-rethinking-the-way-we-treat-data-and-services/
  • 10. © 10x Banking Technology Limited 2022. All rights reserved Data Dichotomy
  • 11. © 10x Banking Technology Limited 2022. All rights reserved Event Driven Design
  • 12. © 10x Banking Technology Limited 2022. All rights reserved Event Driven Design
  • 13. © 10x Banking Technology Limited 2022. All rights reserved Correctness and integrity baked in – the Outbox pattern
  • 14. © 10x Banking Technology Limited 2022. All rights reserved Putting core banking data to use for customers
  • 15. © 10x Banking Technology Limited 2022. All rights reserved Your bank account today
  • 16. © 10x Banking Technology Limited 2022. All rights reserved Let’s build some new products
  • 17. © 10x Banking Technology Limited 2022. All rights reserved What dimensions are interesting?
  • 18. © 10x Banking Technology Limited 2022. All rights reserved What we need Ability to compute analytical aggregates on data with filters from data in other domains
  • 19. © 10x Banking Technology Limited 2022. All rights reserved Possible solutions – data warehouse + real time component
  • 20. © 10x Banking Technology Limited 2022. All rights reserved Possible solutions – data warehouse + real time component
  • 21. © 10x Banking Technology Limited 2022. All rights reserved Pre-aggregation for reliable realtime query latency Reliable query latency but • Dimensions need to be known beforehand • One record generates multiple aggregates • Dimension and storage explosion • Difficult to scale
  • 22. © 10x Banking Technology Limited 2022. All rights reserved Flexibility vs Latency
  • 23. © 10x Banking Technology Limited 2022. All rights reserved Pinot and Trino in 1 minute Pinot is a purpose built data store for ultra-low latency analytics at high throughput • Column oriented • Powerful indexing techniques for low latency aggregation and filtering • Horizontally scalable • Supports high concurrency queries Trino is a distributed ANSI SQL compliant engine • Pluggable connector architecture which allows querying across many data stores, (including Pinot) • Powerful indexing techniques for low latency aggregation and filtering • Built for low latency and efficiency even on large batch queries
  • 24. © 10x Banking Technology Limited 2022. All rights reserved Bridging the gap with Pinot and Trino
  • 25. © 10x Banking Technology Limited 2022. All rights reserved Handling both types of queries with Pinot and Trino • Single copy of data • No need to handle two ingest pipelines • Scaleable horizontally through more Pinot servers and Trino workers
  • 26. © 10x Banking Technology Limited 2022. All rights reserved How much to denormalize? • Pinot does have limited support for lookup joins but not fully featured SQL joins • Aggregation, filtering and grouping are strong with a wide range of indexes to speed up queries • For our use case, most practical to pre-join outside of Pinot and to ingest the pre-joined topic. Aggregations and group-by’s performed in Trino and Pinot depending on query size
  • 27. © 10x Banking Technology Limited 2022. All rights reserved Ensuring correctness
  • 28. © 10x Banking Technology Limited 2022. All rights reserved Deduplication • Duplication is (obviously) not acceptable in core banking • Outbox only guarantees at least once delivery • Message only marked as sent after publication to Kafka, outside of a transaction • Pinot ingestion is exactly once but has no inbuilt deduplication in the ingestion component
  • 29. © 10x Banking Technology Limited 2022. All rights reserved Pinot Upsert and how Pinot consumes in real time • Low level consumer ingestion has one consumer per topic partition • This is duplicated by the replication factor for Pinot • Pinot upsert requires primary key for upsert is placed on the same partition • This can then be checked for duplicate records on that server • Gives an efficient way of deduplication without a global coordinator But there are some cons: • Duplicates are only checked in a given time window • Increasing partitions in Kafka is problematic • Read consistency is not guaranteed • Certain Pinot indexes cannot be used (startree)
  • 30. © 10x Banking Technology Limited 2022. All rights reserved Deduplication - Subquery • Safest to deduplicate on the read side
  • 31. © 10x Banking Technology Limited 2022. All rights reserved Deduplication – Subquery in Pinot • Pinot wraps double query in a single query using IdSets
  • 32. © 10x Banking Technology Limited 2022. All rights reserved Takeaways Domain driven design and microservices have lots of great benefits. Processes become less coupled and product innovation can happen in a safer and more flexible way. But data can become trapped inside domains and building features which require data across multiple domains, like product personalization, become difficult and entangled. Event based architectures are a great way to share data across domains, allowing datasets to be joined. Apache Pinot provides the ability to perform real time customer facing analytics and personalisation without the dimension explosion typical in pre-aggregation solutions in stream processing.