SlideShare a Scribd company logo
Tech view on Regulatory Compliance
MarkLogic User Group Benelux Meetup December 2016
Speaker: Alexander L. de Goeij
About me
• Architect / Consultant
• Financial Services: Core Trading
• Regulations: EMIR, MiFID II
• Architecture: Enterprise / Solution / Project Architect
• Consulting: IT Strategy, implementations, vendor selection, etc.
• Business degree, Tech addiction.
“Regulations really make my life more fun! ”
As said by no-one, ever.
“Regulations really make my life more fun! ”
As said by no-one, ever.
everyone who gets to use cool databases!
exciting
The challenge we think we are facing:
TransformExtract
Source Data
Happy
Regulator
Load Send
extractload
Some Application
The actual challenge we are facing:
Happy
Regulators
DB 1Load
Source Data
Extract
Email
FTP
REST
SOAP
Tool 2Load Extract
Thing NLoad Extract
Database you
didn’t know
still existed
Current solution:
Doesn’t work anymore:
• Auditability / Process checks included in
Regulations.
• Obligation to re-report.
• More complex Ad-Hoc requests from the
Regulator.
• Not suited for Real-Time reporting.
• Waste of money…
What do we need?
• Auditability: keep original data in original format to prove results,
keep track of ‘who-did-what’ with the data.
• Consistency: real-time requirement from regulator demands more
than eventual consistency.
• Forward Flexibility: we know we don’t know what we will have to
report tomorrow.
Looking to technology for a better answer!
Your favorite RDBMS
• ACID, consistent, and blazing fast
if you buy Exadata
• Normalize your way out, and fail.
• Not fit for processing/reporting
across different data objects:
e.g. Trades and Mortgages
• Try to do NoSQL with SQL
(innovative, but terribly slow and
impossible to maintain)
Example of what not to do:
SQL
SQL
MongoDB
• Free! Open Source! GridFS!
• Have to transform data on ingest
(to JSON) as most data is XML
• Eventual consistency (AKA data
loss) means not real-time.
• Good at homogeneous data.
• Still master-slave, and scaling
issues
• Brilliant for RAD / prototyping!
Where things go wrong:
Source: http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/
Cassandra (DataStax)
• Favors data duplication over normalization
• Very fast (if you duplicate well) but does not do JOINs
• Used by ING as main component of their Risk grid (YouTube)
• Excellent for time series data
Source: https://academy.datastax.com/resources/getting-started-time-series-data-modeling
Hadoop
Source: http://hortonworks.com/products/data-center/hdp/
MarkLogic
• Focused on heterogeneously
structured data
• Bitemporal, if you dare
• Semantics / RDF Triples
• ACID, Consistent, stores original file
• ABAC & redaction in enterprise
version
• Rules, Workflows, Alerts, Triggers
• Not a COTS!
Ok, so now what?
Two approaches to a solution
Infra approach:
• Build everything yourself, use
open source components
E.g.:
• Hadoop
• Cassandra + Kafka
Platform approach:
• Focus on application and
business logic, not on infra
E.g.:
• MarkLogic
• Spark (without Hadoop)
Akka ActorsAkka Actors
Spark
SparkKafkaKafka
Infra approach (SMACK example)
• Used (and designed) by
Netflix, LinkedIn, Uber,
Twitter
• Massive amounts of event
processing (IoT)
• HA and Geo distributed
• Scala, Python, R, Java(Script)
• Asynchronous everywhere
• Near impossible to destroy:
reactive, self-healing, back-
pressure.
Kafka
Akka Actors
Play REST APIs
Cassandra
Spark
Mesos OS
Bare
Metal
Bare
Metal
Bare
Metal
Bare
Metal
Cassandra
Cassandra
Zookeeper
Marathon
Play REST APIsPlay REST APIs
Tech view on Regulatory Compliance
Platform approach
MarkLogic
Insert
Time Series
Database here
Spark
Source Data
Qualitative
Quantitative
Data Flows Data Stores Analytics Feedback Loop
Happy
Regulator
• Schema transformations
• Business Rules
• Workflow
• Rights management
Main take-aways
• There are no one-stop solutions
• Don’t pick bleeding edge stuff if you need it to work
• Focus on Business benefit of investment in Regulatory Compliance
• Separate the platform from the project!
• Start small, think big
Thank you for listening !
Alexander L. de Goeij
alexander@aldg.nl
References
• https://academy.datastax.com/resources/getting-started-time-series-data-modeling
• http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/
• http://hortonworks.com/products/data-center/hdp/
• https://www.linkedin.com/pulse/data-hubs-marklogic-vs-hadoop-kurt-cagle
• https://engineering.linkedin.com/blog/2016/04/kafka-ecosystem-at-linkedin
• http://www.datanami.com/2015/10/05/how-uber-uses-spark-and-hadoop
• https://blog.twitter.com/2015/handling-five-billion-sessions-a-day-in-real-time
• http://techblog.netflix.com/2013/12/announcing-suro-backbone-of-netflix.html

More Related Content

What's hot (20)

PDF
Mike Stonebraker on Designing An Architecture For Real-time Event Processing
VoltDB
 
PDF
Eat Your Data and Have It Too: Get the Blazing Performance of In-Memory Opera...
VoltDB
 
PDF
The lean principles of data ops
Lars Albertsson
 
PPTX
DITA's New Thang: Going Mapless!
dclsocialmedia
 
PPTX
Siligong.Data - May 2021 - Transforming your analytics workflow with dbt
Jon Su
 
PDF
Business Track: How MongoDB Helps Telefonia Digital Accelerate Time to Market
MongoDB
 
PPTX
Data-Driven User Experience
dclsocialmedia
 
PDF
DataOps - Lean principles and lean practices
Lars Albertsson
 
PDF
LinkedInSaxoBankDataWorkbench
Sheetal Pratik
 
PDF
Ready for Fast Data: How Lightbend Enables Teams To Build Real-Time, Streamin...
Lightbend
 
PDF
Graphs for Enterprise Architects
Neo4j
 
PPTX
Preparing Your Legacy Data for Automation in S1000D
dclsocialmedia
 
PDF
How Verizon Uses Disruptive Developments for Organized Progress
MongoDB
 
PPT
Intranet show and_tell_2010
Charlie Hull
 
PDF
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
Neo4j
 
PPTX
Content Development: Measuring the Trends
dclsocialmedia
 
PDF
Implementing and running a secure datalake from the trenches
DataWorks Summit
 
PDF
Offload, Transform, and Present - the New World of Data Integration
Michael Rainey
 
PDF
The State of Streaming Analytics: The Need for Speed and Scale
VoltDB
 
PPTX
Managing Deliverable-Specific Link Anchors: New Suggested Best Practice for Keys
dclsocialmedia
 
Mike Stonebraker on Designing An Architecture For Real-time Event Processing
VoltDB
 
Eat Your Data and Have It Too: Get the Blazing Performance of In-Memory Opera...
VoltDB
 
The lean principles of data ops
Lars Albertsson
 
DITA's New Thang: Going Mapless!
dclsocialmedia
 
Siligong.Data - May 2021 - Transforming your analytics workflow with dbt
Jon Su
 
Business Track: How MongoDB Helps Telefonia Digital Accelerate Time to Market
MongoDB
 
Data-Driven User Experience
dclsocialmedia
 
DataOps - Lean principles and lean practices
Lars Albertsson
 
LinkedInSaxoBankDataWorkbench
Sheetal Pratik
 
Ready for Fast Data: How Lightbend Enables Teams To Build Real-Time, Streamin...
Lightbend
 
Graphs for Enterprise Architects
Neo4j
 
Preparing Your Legacy Data for Automation in S1000D
dclsocialmedia
 
How Verizon Uses Disruptive Developments for Organized Progress
MongoDB
 
Intranet show and_tell_2010
Charlie Hull
 
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
Neo4j
 
Content Development: Measuring the Trends
dclsocialmedia
 
Implementing and running a secure datalake from the trenches
DataWorks Summit
 
Offload, Transform, and Present - the New World of Data Integration
Michael Rainey
 
The State of Streaming Analytics: The Need for Speed and Scale
VoltDB
 
Managing Deliverable-Specific Link Anchors: New Suggested Best Practice for Keys
dclsocialmedia
 

Viewers also liked (14)

PPTX
Helderheid in Wegdekreflectie CROW infradagen 2016 (Paper 106) 160622
Piet Zijlstra
 
PDF
Testing For Web Accessibility
Hagai Asaban
 
PDF
What is the Joomla Framework and why do we need it?
Rouven Weßling
 
PDF
Nghị định 44/2016/NĐ-CP ngày 15 tháng 5 năm 2016 có hiệu lực ngày 01 tháng 7 ...
Thư Nguyễn
 
DOC
44 2016 nd-cp_quy định chi tiết một số điều của luật atvslđ về hoạt động kiểm...
Tai Bún
 
PDF
Your first patch to OpenStack
openstackindia
 
PDF
De la administración de salario a la gestión de la Recompensa Total
APD Asociación para el Progreso de la Dirección
 
PPTX
Big Data - Hadoop and MapReduce - Aditya Garg
Agile Testing Alliance
 
PDF
API Testing
Bikash Sharma
 
PPTX
The New Gives and Takes in a testers role
Agile Testing Alliance
 
PPTX
Blood collection and anticoagulants
Janani Mathialagan
 
PDF
Nghị định số 39/2016/NĐ-CP
kim chi
 
Helderheid in Wegdekreflectie CROW infradagen 2016 (Paper 106) 160622
Piet Zijlstra
 
Testing For Web Accessibility
Hagai Asaban
 
What is the Joomla Framework and why do we need it?
Rouven Weßling
 
Nghị định 44/2016/NĐ-CP ngày 15 tháng 5 năm 2016 có hiệu lực ngày 01 tháng 7 ...
Thư Nguyễn
 
44 2016 nd-cp_quy định chi tiết một số điều của luật atvslđ về hoạt động kiểm...
Tai Bún
 
Your first patch to OpenStack
openstackindia
 
De la administración de salario a la gestión de la Recompensa Total
APD Asociación para el Progreso de la Dirección
 
Big Data - Hadoop and MapReduce - Aditya Garg
Agile Testing Alliance
 
API Testing
Bikash Sharma
 
The New Gives and Takes in a testers role
Agile Testing Alliance
 
Blood collection and anticoagulants
Janani Mathialagan
 
Nghị định số 39/2016/NĐ-CP
kim chi
 
Ad

Similar to Tech view on Regulatory Compliance (20)

PPTX
Data governance datalakes_multitenancy
Sathish K S
 
PPTX
Demystifying data engineering
Thang Bui (Bob)
 
PDF
The New Database Frontier: Harnessing the Cloud
Inside Analysis
 
PPTX
MediaMath - Big Data Warehousing Meetup - 2/16/2016
SoryRawyer
 
PDF
Moving Past Infrastructure Limitations
Caserta
 
PPTX
Building a Big Data Pipeline
Jesus Rodriguez
 
PDF
Case Studies on Big-Data Processing and Streaming - Iranian Java User Group
Amir Sedighi
 
PDF
Nisha talagala keynote_inflow_2016
Nisha Talagala
 
PDF
Big data pipelines
Vivek Aanand Ganesan
 
PPTX
Taming the regulatory tiger with jwg and smartlogic
Ann Kelly
 
PPTX
Data-As-A-Service to enable compliance reporting
AnalyticsWeek
 
PDF
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Databricks
 
PPTX
Data Warehouse on Hadoop Based System In Action
Frank Y
 
PPTX
WebAction In-Memory Computing Summit 2015
WebAction
 
PPTX
Hadoop and Your Data Warehouse
Caserta
 
PDF
Workshop on Real-time & Stream Analytics IEEE BigData 2016
Sabri Skhiri
 
PDF
Simply Business' Data Platform
Dani Solà Lagares
 
PDF
TODE17 The Programmable RegTech Ecosystem
Workiva
 
PDF
ThoughtWorks Technology Radar Roadshow - Melbourne
Thoughtworks
 
PPTX
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
AWS User Group Kochi
 
Data governance datalakes_multitenancy
Sathish K S
 
Demystifying data engineering
Thang Bui (Bob)
 
The New Database Frontier: Harnessing the Cloud
Inside Analysis
 
MediaMath - Big Data Warehousing Meetup - 2/16/2016
SoryRawyer
 
Moving Past Infrastructure Limitations
Caserta
 
Building a Big Data Pipeline
Jesus Rodriguez
 
Case Studies on Big-Data Processing and Streaming - Iranian Java User Group
Amir Sedighi
 
Nisha talagala keynote_inflow_2016
Nisha Talagala
 
Big data pipelines
Vivek Aanand Ganesan
 
Taming the regulatory tiger with jwg and smartlogic
Ann Kelly
 
Data-As-A-Service to enable compliance reporting
AnalyticsWeek
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Databricks
 
Data Warehouse on Hadoop Based System In Action
Frank Y
 
WebAction In-Memory Computing Summit 2015
WebAction
 
Hadoop and Your Data Warehouse
Caserta
 
Workshop on Real-time & Stream Analytics IEEE BigData 2016
Sabri Skhiri
 
Simply Business' Data Platform
Dani Solà Lagares
 
TODE17 The Programmable RegTech Ecosystem
Workiva
 
ThoughtWorks Technology Radar Roadshow - Melbourne
Thoughtworks
 
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
AWS User Group Kochi
 
Ad

Recently uploaded (20)

PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
PDF
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PPT
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PPTX
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 

Tech view on Regulatory Compliance

  • 1. Tech view on Regulatory Compliance MarkLogic User Group Benelux Meetup December 2016 Speaker: Alexander L. de Goeij
  • 2. About me • Architect / Consultant • Financial Services: Core Trading • Regulations: EMIR, MiFID II • Architecture: Enterprise / Solution / Project Architect • Consulting: IT Strategy, implementations, vendor selection, etc. • Business degree, Tech addiction.
  • 3. “Regulations really make my life more fun! ” As said by no-one, ever.
  • 4. “Regulations really make my life more fun! ” As said by no-one, ever. everyone who gets to use cool databases! exciting
  • 5. The challenge we think we are facing: TransformExtract Source Data Happy Regulator Load Send extractload Some Application
  • 6. The actual challenge we are facing: Happy Regulators DB 1Load Source Data Extract Email FTP REST SOAP Tool 2Load Extract Thing NLoad Extract Database you didn’t know still existed
  • 7. Current solution: Doesn’t work anymore: • Auditability / Process checks included in Regulations. • Obligation to re-report. • More complex Ad-Hoc requests from the Regulator. • Not suited for Real-Time reporting. • Waste of money…
  • 8. What do we need? • Auditability: keep original data in original format to prove results, keep track of ‘who-did-what’ with the data. • Consistency: real-time requirement from regulator demands more than eventual consistency. • Forward Flexibility: we know we don’t know what we will have to report tomorrow.
  • 9. Looking to technology for a better answer!
  • 10. Your favorite RDBMS • ACID, consistent, and blazing fast if you buy Exadata • Normalize your way out, and fail. • Not fit for processing/reporting across different data objects: e.g. Trades and Mortgages • Try to do NoSQL with SQL (innovative, but terribly slow and impossible to maintain) Example of what not to do: SQL SQL
  • 11. MongoDB • Free! Open Source! GridFS! • Have to transform data on ingest (to JSON) as most data is XML • Eventual consistency (AKA data loss) means not real-time. • Good at homogeneous data. • Still master-slave, and scaling issues • Brilliant for RAD / prototyping! Where things go wrong: Source: http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/
  • 12. Cassandra (DataStax) • Favors data duplication over normalization • Very fast (if you duplicate well) but does not do JOINs • Used by ING as main component of their Risk grid (YouTube) • Excellent for time series data Source: https://academy.datastax.com/resources/getting-started-time-series-data-modeling
  • 14. MarkLogic • Focused on heterogeneously structured data • Bitemporal, if you dare • Semantics / RDF Triples • ACID, Consistent, stores original file • ABAC & redaction in enterprise version • Rules, Workflows, Alerts, Triggers • Not a COTS!
  • 15. Ok, so now what?
  • 16. Two approaches to a solution Infra approach: • Build everything yourself, use open source components E.g.: • Hadoop • Cassandra + Kafka Platform approach: • Focus on application and business logic, not on infra E.g.: • MarkLogic • Spark (without Hadoop)
  • 17. Akka ActorsAkka Actors Spark SparkKafkaKafka Infra approach (SMACK example) • Used (and designed) by Netflix, LinkedIn, Uber, Twitter • Massive amounts of event processing (IoT) • HA and Geo distributed • Scala, Python, R, Java(Script) • Asynchronous everywhere • Near impossible to destroy: reactive, self-healing, back- pressure. Kafka Akka Actors Play REST APIs Cassandra Spark Mesos OS Bare Metal Bare Metal Bare Metal Bare Metal Cassandra Cassandra Zookeeper Marathon Play REST APIsPlay REST APIs
  • 19. Platform approach MarkLogic Insert Time Series Database here Spark Source Data Qualitative Quantitative Data Flows Data Stores Analytics Feedback Loop Happy Regulator • Schema transformations • Business Rules • Workflow • Rights management
  • 20. Main take-aways • There are no one-stop solutions • Don’t pick bleeding edge stuff if you need it to work • Focus on Business benefit of investment in Regulatory Compliance • Separate the platform from the project! • Start small, think big
  • 21. Thank you for listening ! Alexander L. de Goeij [email protected]
  • 22. References • https://academy.datastax.com/resources/getting-started-time-series-data-modeling • http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/ • http://hortonworks.com/products/data-center/hdp/ • https://www.linkedin.com/pulse/data-hubs-marklogic-vs-hadoop-kurt-cagle • https://engineering.linkedin.com/blog/2016/04/kafka-ecosystem-at-linkedin • http://www.datanami.com/2015/10/05/how-uber-uses-spark-and-hadoop • https://blog.twitter.com/2015/handling-five-billion-sessions-a-day-in-real-time • http://techblog.netflix.com/2013/12/announcing-suro-backbone-of-netflix.html