Fast data is the key
to next-gen capital
and risk management
-By Deenar Toraskar
-Founder
-ThinkReactive
Agenda
Introduction
Business Background
Technology Impact
Big Data - Redux
Big Data - Relevant Technologies
Challenges
Recommendations
•Hands on
•Many years and many roles at an investment bank
•20 years of JVM experience
•10 years+ of risk expertise
•5 year of big data Hadoop, Hive, Spark, Scala
•Founder of Think Reactive - Risk analytics solutions
• FRTB Market and Credit Risk, Stress Testing, IFRS9 solutions
Me and Spark
Background
• Increased capital requirements
• More capital charges
• Capital cost is now a large part of
PnL
• Capital charges are complex
• Capital charges are portfolio level
• More scenarios are required
Increased Regulatory Measures
VaR (general + specific)
Stressed VaR
CRM
Floor
Standardized Charge
Incremental Risk Charge
Hypothetical Backtesting
CVA
WWR(Wrong Way Risk)
Stressed EPE
Liquidity Coverage Ratios
Net Stable Funding Ratio
Portfolio Stress Testing
Business Impact
•Regulatory capital costs are now a significant part of
trading profit and loss.
•The emphasis on risk management is changing and
growing.
•Risk functions are now integral part of trading and risk
has now moved from a back-office to a corner-office
function.
•Exposures are monitored proactively and risk measures
calculations are calculated continuously throughout the
trading day.
•In addition regulatory risk impact is now a critical part
of pre-trade decision making.
Technology Impact - Perfect Storm
• Volumes - More measures, more scenarios, more
granular, portfolio level calculations
• Variety - More frequent, more timely, near-real time
view required
• Variety - complex measures, trade level, more
reference and market data required, structured
products
• Veracity - BCBS lays down general principles for
management of risk data sets such as
completeness, traceability, accuracy, validation,
reconciliation and integrity
Current State
Current State - Suboptimal technologies
Disparate systems for risk calculation, distribution, aggregation and reporting
and market data using compute grids, in memory data fabrics, messaging
infrastructure, relational databases and shared file systems
Why - having to support granular mutable state makes both databases and
in-memory data grids hard to scale
Compute grids are great for distributing compute tasks, but often leave the
compute capacity underutilized as the data distribution infrastructure to
and from the compute grids does not keep up with demand.
Relational databases have limited support for semi-structured data and
complex data structures representing pricing (and risk) library inputs and
outputs. This often leads to only summary results being persisted resulting
Many moving parts
Custom formats - conversions, serialisation,
deserialisation multiple times
Many network hops - data being copied
multiple times
Databases used for storage - hard to scale
suboptimal, fixed schemas
Current State - continued
Big Data - Redux
Traditional systems, and the data management techniques associated with
them, have failed to scale to Big Data.
To tackle the challenges of Big Data, a new breed of technologies has
emerged. Many of these new technologies have been grouped under the
term Big Data.
These systems can scale to vastly larger sets of data, but using these
technologies effectively requires a fundamentally new set of techniques.
They aren’t one-size-fits-all solutions.
Many of these Big Data systems were pioneered by Internet giants (Google,
Yahoo, Amazon, Facebook, LinkedIn, etc), including distributed
filesystems, the MapReduce computation framework, key-value store,
message buses and distributed locking services.
Big Data History
Specialised Systems
HDFS vs SAN/NFS
Cassandra vs Databases
Kafka vs MQ/Message Queues
In-Memory DataGrids, Spark vs IMDGs
Built to scale, partitioning/sharding out of the box
Resilient
Sharing
Cloud Friendly
Primarily open source
Big Data - How do they handle 4Vs
Google Protocol Buffers
Thrift
Avro
Native support for complex data types (arrays, vector,
matrices, maps, structures), evolvable schemas
Standard Interfaces viz Hive Serde
Out of the box tool support - viz. RPC, ETL, SQL on
Serialization Formats
Avro - Splittable, evolvable schema
Parquet/ORC - Columnar, High compression, predicate
pushdown, optimised for data warehouses
Support same rich types
File/Storage Formats
Input/Output Format
Serde (Hive Serialisation Deserialisation Interface)
Spark DataSources
Hive Metastore
Quicker time to market, increased productivity
Standard Interfaces and Interoperable
components
Scala + Spark = Big Data DSL
Scala + Spark + HDFS Interfaces
concise, versatile (ETL, aggregation, compute, pluggable
storage, query, caching,, etc.)
Standards in Open source/Hadoop ecosystem
Extensible
Enables agility
Machine learning
Natural language processing
Deep learning
Graph Analytics
Stream processing
Probabilistic Data Structures
New algorithms/patterns
Pick the best storage for your data sets, mix and match
HDFS - Large data sets at rest, colocated compute
No-SQL - Key value, mutable state, evolvable schemas
Kafka - high-throughput, distributed, publish-subscribe messaging system
Elastic Search/Solr - Text, semi-structured, search optimised
Existing relational stores
Other - Graph databases and custom sources
Polyglot Persistence - New Storage
Solutions
Notebook is a web application that allows you to create and share documents that
contain live code, equations, visualizations and explanatory text. Can be used for data
cleaning and transformation, numerical simulation, statistical modeling, machine learning
and much more.
Versatile tool for product development
Quick prototypes
Risk reduction
Early visibility and feedback from end users
Supports agile methodology
Interactive Notebooks
Big Data - Capital Management
Made for each other
Change the existing systems to satisfy the minimum
regulatory requirements.
Not as low risk as legacy platforms are being stretched to their limits and breaking points.
Limited capital management benefits
Migrate to big data
Meet regulatory requirements
Platform for efficient capital management provides competitive advantages
Cost benefits
Decision
Big Data - Challenges
Rapidly changing, 3 generations of the Hadoop stack in 4/5 years
Require a combination of skills and expertise - JVM, databases, functional
programming, distributed systems
HDFS is different than databases - immutable and distributed
Data modelling expertise (Model around queries, data duplication is good)
Functional Programming - Immutable data sets - Scala and Spark
All this + Agile Mindset + Domain Expertise hard to find
Why is Big Data is hard?
Focus on solutions not technology. - Incentive seems to be to get a big data solution to market,
rather than to meet strategic objectives or business benefits. This can lead to inadequate
solutions that don’t hit the mark with users and won’t move the needle on business
performance.
Improper use of big data technology. The big Data stack is not suited for all problems. e.g. storing
configuration a relational database is better than HBASE, multi -dimensional access is required
a traditional warehouse e.g. OLAP Cube or Teradata is a better fit.
Focus on big data technology, not the best of breed technology choice. In many cases there are
existing open source components that are better than components branded “Big Data”. e.g. for
workflow there are better choices than Oozie
Vendor dependence - vendors often propose a solution that encourages lock-in to their distribution,
rather than the most appropriate solution for the problem in hand.
Skill - Very few people have the mix of skills required, solutions focus and domain expertise. Skills
need to be built up by having real world experience on projects in production.
Common causes of Big Data project failures
Lots of Execution Risks
Specification Risk - Are the users asking what they want?
People Risk - Can you get the right people at the right time?
Skills Risk - Do they have the right mix of skills?
Technical Risk - Are the right components selected?
Integration Risk - Will the components work together?
Opportunity Cost - Failure to get to market.
Recommendations
Be strictly agile
Agile - Best Governance Model
Early and continuous delivery of valuable software. Working software is the
primary measure of progress
Cross functional teams
Frequent communication
Technical excellence and good design enhances agility
Simplicity--the art of maximizing the amount of work not done--is essential
The best architectures, requirements, and designs emerge from self-
organizing teams
Retrospectives - feedback loop, continuous improvement.
Focussed Simple Stack
Introduce a new technology only if it adds business benefit
Risk vs. Value judgement
Deep expertise helps, reduces people risk
Start with uses cases where Big Data
Stack has a greater competitive
advantages
Large data sets at rest
Rich data types and evolvable schema
Colocated data + compute
PnL Store, Scenario set (market data) stores
PnL Aggregations (Credit + Market Risk)
Stress Testing
Portfolio Calculations (IRC, Standardized Charge)
Warehouse off boarding
Use Market Driven Technology Selection
Github commits
Mailing list traffic
Number of contributors
Job Postings
Google keyword searches
Real world adoption vs vendor developed product
Use best solution fit for purpose
Just because a technology is part of the big data stack it is
good
Evaluate the big data technologies as you do for any other
technology choice
pick and choose the best from the options available not
just best big data solution fit for purpose
e.g. HBASE
Use big data to test hypothesis
Run full scale tests quickly
Reverse engineer quickly rather than analyse all paths
upfront
Recommendations
Be strictly agile
Reduced time to market for delivery risk management
Driven by business benefit
Simplify Stack
Spark + Scala = Big Data DSL
Add abstractions later
Use market information to evaluate technologies
Leverage big data competitive advantages
Scala + Spark = Big Data DSL
Scala + Spark + HDFS Interfaces
concise, versatile (ETL, aggregation, compute, pluggable
storage, query, caching,, etc.)
Standards in Open source/Hadoop ecosystem
Extensible
Enables agility
Defining a MVP is hard to get right
Thank You
•Rate my session
•Feel free to reach out
•Ready to go risk analytics

More Related Content

PDF
BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud
PDF
Accelerate Analytics and ML in the Hybrid Cloud Era
PDF
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
PDF
Speeding Up Atlas Deep Learning Platform with Alluxio + Fluid
PDF
Enabling big data & AI workloads on the object store at DBS
PDF
The Pandemic Changes Everything, the Need for Speed and Resiliency
PDF
Unified Data Access with Gimel
PDF
Reducing large S3 API costs using Alluxio at Datasapiens
BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud
Accelerate Analytics and ML in the Hybrid Cloud Era
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Speeding Up Atlas Deep Learning Platform with Alluxio + Fluid
Enabling big data & AI workloads on the object store at DBS
The Pandemic Changes Everything, the Need for Speed and Resiliency
Unified Data Access with Gimel
Reducing large S3 API costs using Alluxio at Datasapiens

What's hot (20)

PDF
Data Orchestration for AI, Big Data, and Cloud
PDF
Delivering Data Science to the Business
PDF
Alluxio - Virtual Unified File System
PDF
Alluxio Architecture and Performance
PDF
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
PDF
BDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
PDF
Orchestrate a Data Symphony
PDF
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
PDF
StorageQuery: federated querying on object stores, powered by Alluxio and Presto
PDF
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
PPTX
Data lake-itweekend-sharif university-vahid amiry
PDF
The Practice of Presto & Alluxio in E-Commerce Big Data Platform
PDF
Deep Learning in the Cloud at Scale: A Data Orchestration Story
PPTX
Real-Time Analytics in Transactional Applications by Brian Bulkowski
PDF
Alluxio Use Cases and Future Directions
PDF
Apache Spark vs Apache Spark: An On-Prem Comparison of Databricks and Open-So...
PDF
A Gentle Introduction to GPU Computing by Armen Donigian
PDF
Achieving compute and storage independence for data-driven workloads
PPTX
Big data vahidamiri-datastack.ir
PPTX
Streaming Real-time Data to Azure Data Lake Storage Gen 2
Data Orchestration for AI, Big Data, and Cloud
Delivering Data Science to the Business
Alluxio - Virtual Unified File System
Alluxio Architecture and Performance
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
BDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
Orchestrate a Data Symphony
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
StorageQuery: federated querying on object stores, powered by Alluxio and Presto
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
Data lake-itweekend-sharif university-vahid amiry
The Practice of Presto & Alluxio in E-Commerce Big Data Platform
Deep Learning in the Cloud at Scale: A Data Orchestration Story
Real-Time Analytics in Transactional Applications by Brian Bulkowski
Alluxio Use Cases and Future Directions
Apache Spark vs Apache Spark: An On-Prem Comparison of Databricks and Open-So...
A Gentle Introduction to GPU Computing by Armen Donigian
Achieving compute and storage independence for data-driven workloads
Big data vahidamiri-datastack.ir
Streaming Real-time Data to Azure Data Lake Storage Gen 2
Ad

Viewers also liked (20)

PDF
BDW16 London - Mark van Rijmenam, Datafloq - Big Data is Dead, Long Live Big ...
PDF
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
PDF
BDW16 London - Marius Boeru, Bigstep - How to Automate Big Data with Ansible
PDF
BDW16 London - Roland Major, Transport for London - Cloud Search Secured
PDF
BDW16 London - John Callan, Boxever - Data and Analytics - The Fuel Your Bran...
PDF
BDW16 London - Jonny Voon, Innovate UK - Smart Cities and the Buzz Word Bingo
PDF
BDW16 London - Vojta Rocek, Trologic - Challenging Big Data
PDF
BDW16 London - Chris von Csefalvay, Helioserv - Cats and What They Tell us Ab...
PDF
BDW16 London - Wael Elrifai, Pentaho - Big Data-Driven Innovatiom
PDF
BDW16 London - Nondas Sourlas, Bupa - Big Data in Healthcare
PPTX
BDW16 London - Josh Partridge, Shazam - How Labels, Radio Stations and Brand...
PDF
BDW16 London - Amjad Zaim, Cognitro Analytics: How Deep is Your Learning
PDF
BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...
PDF
BDW16 London - Charlie Ballard, TripAdvisor - TripAdvisor and Constant Change...
PDF
BDW16 London - Mishal Patel, NHS - Modernising Routine Breast Cancer Using Bi...
PDF
BDW16 London - John Belchamber, Telefonica - New Data, New Strategies, New Op...
PDF
BDW16 London - Rob Anderson, MapR - Big Data and Everyday Lives
PDF
BDW16 London - Harry Powell & Raffael Strassnig, Barclays UK - Graph-Based Re...
PPTX
ETL Metadata Injection with Pentaho Data Integration
PDF
SugarCRM Enterprise Development Virtual Appliance
BDW16 London - Mark van Rijmenam, Datafloq - Big Data is Dead, Long Live Big ...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Marius Boeru, Bigstep - How to Automate Big Data with Ansible
BDW16 London - Roland Major, Transport for London - Cloud Search Secured
BDW16 London - John Callan, Boxever - Data and Analytics - The Fuel Your Bran...
BDW16 London - Jonny Voon, Innovate UK - Smart Cities and the Buzz Word Bingo
BDW16 London - Vojta Rocek, Trologic - Challenging Big Data
BDW16 London - Chris von Csefalvay, Helioserv - Cats and What They Tell us Ab...
BDW16 London - Wael Elrifai, Pentaho - Big Data-Driven Innovatiom
BDW16 London - Nondas Sourlas, Bupa - Big Data in Healthcare
BDW16 London - Josh Partridge, Shazam - How Labels, Radio Stations and Brand...
BDW16 London - Amjad Zaim, Cognitro Analytics: How Deep is Your Learning
BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...
BDW16 London - Charlie Ballard, TripAdvisor - TripAdvisor and Constant Change...
BDW16 London - Mishal Patel, NHS - Modernising Routine Breast Cancer Using Bi...
BDW16 London - John Belchamber, Telefonica - New Data, New Strategies, New Op...
BDW16 London - Rob Anderson, MapR - Big Data and Everyday Lives
BDW16 London - Harry Powell & Raffael Strassnig, Barclays UK - Graph-Based Re...
ETL Metadata Injection with Pentaho Data Integration
SugarCRM Enterprise Development Virtual Appliance
Ad

Similar to BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient Capital Management (20)

PPTX
Bitkom Cray presentation - on HPC affecting big data analytics in FS
PDF
Exploring the Wider World of Big Data
PPTX
Big data unit 2
PDF
BI Masterclass slides (Reference Architecture v3)
PPTX
Data Warehouse Modernization: Accelerating Time-To-Action
PDF
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
PPTX
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
PDF
Exploring the Wider World of Big Data- Vasalis Kapsalis
PPTX
Big Data Practice_Planning_steps_RK
PDF
Customer value analysis of big data products
PPT
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
PDF
Hadoop Perspectives for 2017
PDF
Hadoop - Architectural road map for Hadoop Ecosystem
PDF
Big data analysis concepts and references
PPTX
TSE_Pres12.pptx
PPTX
Эволюция Big Data и Information Management. Reference Architecture.
PDF
BAR360 open data platform presentation at DAMA, Sydney
PPTX
Hadoop - An Introduction
PDF
Creating a Next-Generation Big Data Architecture
PDF
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Exploring the Wider World of Big Data
Big data unit 2
BI Masterclass slides (Reference Architecture v3)
Data Warehouse Modernization: Accelerating Time-To-Action
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Exploring the Wider World of Big Data- Vasalis Kapsalis
Big Data Practice_Planning_steps_RK
Customer value analysis of big data products
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
Hadoop Perspectives for 2017
Hadoop - Architectural road map for Hadoop Ecosystem
Big data analysis concepts and references
TSE_Pres12.pptx
Эволюция Big Data и Information Management. Reference Architecture.
BAR360 open data platform presentation at DAMA, Sydney
Hadoop - An Introduction
Creating a Next-Generation Big Data Architecture
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02

More from Big Data Week (10)

PPTX
BDW17 London - Edward Kibardin - Mitie PLC - Learning and Topological Data A...
PPTX
BDWW17 London - Steve Bradbury, GRSC - Big Data to the Rescue: A Fraud Case S...
PDF
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
PPTX
BDW17 London - Rita Simoes, Boehringer Ingelheim - Big Data in Pharma: Sittin...
PPTX
BDW17 London - Mick Ridley, Exterion Media & Dale Campbell , TfL - Transformi...
PDF
BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...
PPTX
BDW17 London - Steve Bradbury - GRSC - Making Sense of the Chaos of Data
PPTX
BDW17 London - Andy Boura - Thomson Reuters - Does Big Data Have to Mean Big ...
PPTX
BDW17 London - Tom Woolrich, Financial Times - What Does Big Data Mean for th...
PPTX
BDW17 London - Andrew Fryer, Microsoft - Everybody Needs a Bit of Science in ...
BDW17 London - Edward Kibardin - Mitie PLC - Learning and Topological Data A...
BDWW17 London - Steve Bradbury, GRSC - Big Data to the Rescue: A Fraud Case S...
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
BDW17 London - Rita Simoes, Boehringer Ingelheim - Big Data in Pharma: Sittin...
BDW17 London - Mick Ridley, Exterion Media & Dale Campbell , TfL - Transformi...
BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...
BDW17 London - Steve Bradbury - GRSC - Making Sense of the Chaos of Data
BDW17 London - Andy Boura - Thomson Reuters - Does Big Data Have to Mean Big ...
BDW17 London - Tom Woolrich, Financial Times - What Does Big Data Mean for th...
BDW17 London - Andrew Fryer, Microsoft - Everybody Needs a Bit of Science in ...

Recently uploaded (20)

PDF
sbt 2.0: go big (Scala Days 2025 edition)
PDF
4 layer Arch & Reference Arch of IoT.pdf
PDF
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
PDF
sustainability-14-14877-v2.pddhzftheheeeee
DOCX
search engine optimization ppt fir known well about this
PPTX
Training Program for knowledge in solar cell and solar industry
PDF
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
PDF
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
PDF
Comparative analysis of machine learning models for fake news detection in so...
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
Advancing precision in air quality forecasting through machine learning integ...
PPTX
Configure Apache Mutual Authentication
PDF
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
PDF
Auditboard EB SOX Playbook 2023 edition.
PPTX
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
PDF
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
PDF
Flame analysis and combustion estimation using large language and vision assi...
PPTX
Internet of Everything -Basic concepts details
PDF
Rapid Prototyping: A lecture on prototyping techniques for interface design
PDF
Lung cancer patients survival prediction using outlier detection and optimize...
sbt 2.0: go big (Scala Days 2025 edition)
4 layer Arch & Reference Arch of IoT.pdf
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
sustainability-14-14877-v2.pddhzftheheeeee
search engine optimization ppt fir known well about this
Training Program for knowledge in solar cell and solar industry
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
Comparative analysis of machine learning models for fake news detection in so...
Taming the Chaos: How to Turn Unstructured Data into Decisions
Advancing precision in air quality forecasting through machine learning integ...
Configure Apache Mutual Authentication
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
Auditboard EB SOX Playbook 2023 edition.
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
Flame analysis and combustion estimation using large language and vision assi...
Internet of Everything -Basic concepts details
Rapid Prototyping: A lecture on prototyping techniques for interface design
Lung cancer patients survival prediction using outlier detection and optimize...

BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient Capital Management

  • 1. Fast data is the key to next-gen capital and risk management -By Deenar Toraskar -Founder -ThinkReactive
  • 2. Agenda Introduction Business Background Technology Impact Big Data - Redux Big Data - Relevant Technologies Challenges Recommendations
  • 3. •Hands on •Many years and many roles at an investment bank •20 years of JVM experience •10 years+ of risk expertise •5 year of big data Hadoop, Hive, Spark, Scala •Founder of Think Reactive - Risk analytics solutions • FRTB Market and Credit Risk, Stress Testing, IFRS9 solutions Me and Spark
  • 4. Background • Increased capital requirements • More capital charges • Capital cost is now a large part of PnL • Capital charges are complex • Capital charges are portfolio level • More scenarios are required
  • 5. Increased Regulatory Measures VaR (general + specific) Stressed VaR CRM Floor Standardized Charge Incremental Risk Charge Hypothetical Backtesting CVA WWR(Wrong Way Risk) Stressed EPE Liquidity Coverage Ratios Net Stable Funding Ratio Portfolio Stress Testing
  • 6. Business Impact •Regulatory capital costs are now a significant part of trading profit and loss. •The emphasis on risk management is changing and growing. •Risk functions are now integral part of trading and risk has now moved from a back-office to a corner-office function. •Exposures are monitored proactively and risk measures calculations are calculated continuously throughout the trading day. •In addition regulatory risk impact is now a critical part of pre-trade decision making.
  • 7. Technology Impact - Perfect Storm • Volumes - More measures, more scenarios, more granular, portfolio level calculations • Variety - More frequent, more timely, near-real time view required • Variety - complex measures, trade level, more reference and market data required, structured products • Veracity - BCBS lays down general principles for management of risk data sets such as completeness, traceability, accuracy, validation, reconciliation and integrity
  • 9. Current State - Suboptimal technologies Disparate systems for risk calculation, distribution, aggregation and reporting and market data using compute grids, in memory data fabrics, messaging infrastructure, relational databases and shared file systems Why - having to support granular mutable state makes both databases and in-memory data grids hard to scale Compute grids are great for distributing compute tasks, but often leave the compute capacity underutilized as the data distribution infrastructure to and from the compute grids does not keep up with demand. Relational databases have limited support for semi-structured data and complex data structures representing pricing (and risk) library inputs and outputs. This often leads to only summary results being persisted resulting
  • 10. Many moving parts Custom formats - conversions, serialisation, deserialisation multiple times Many network hops - data being copied multiple times Databases used for storage - hard to scale suboptimal, fixed schemas Current State - continued
  • 11. Big Data - Redux
  • 12. Traditional systems, and the data management techniques associated with them, have failed to scale to Big Data. To tackle the challenges of Big Data, a new breed of technologies has emerged. Many of these new technologies have been grouped under the term Big Data. These systems can scale to vastly larger sets of data, but using these technologies effectively requires a fundamentally new set of techniques. They aren’t one-size-fits-all solutions. Many of these Big Data systems were pioneered by Internet giants (Google, Yahoo, Amazon, Facebook, LinkedIn, etc), including distributed filesystems, the MapReduce computation framework, key-value store, message buses and distributed locking services. Big Data History
  • 13. Specialised Systems HDFS vs SAN/NFS Cassandra vs Databases Kafka vs MQ/Message Queues In-Memory DataGrids, Spark vs IMDGs Built to scale, partitioning/sharding out of the box Resilient Sharing Cloud Friendly Primarily open source Big Data - How do they handle 4Vs
  • 14. Google Protocol Buffers Thrift Avro Native support for complex data types (arrays, vector, matrices, maps, structures), evolvable schemas Standard Interfaces viz Hive Serde Out of the box tool support - viz. RPC, ETL, SQL on Serialization Formats
  • 15. Avro - Splittable, evolvable schema Parquet/ORC - Columnar, High compression, predicate pushdown, optimised for data warehouses Support same rich types File/Storage Formats
  • 16. Input/Output Format Serde (Hive Serialisation Deserialisation Interface) Spark DataSources Hive Metastore Quicker time to market, increased productivity Standard Interfaces and Interoperable components
  • 17. Scala + Spark = Big Data DSL Scala + Spark + HDFS Interfaces concise, versatile (ETL, aggregation, compute, pluggable storage, query, caching,, etc.) Standards in Open source/Hadoop ecosystem Extensible Enables agility
  • 18. Machine learning Natural language processing Deep learning Graph Analytics Stream processing Probabilistic Data Structures New algorithms/patterns
  • 19. Pick the best storage for your data sets, mix and match HDFS - Large data sets at rest, colocated compute No-SQL - Key value, mutable state, evolvable schemas Kafka - high-throughput, distributed, publish-subscribe messaging system Elastic Search/Solr - Text, semi-structured, search optimised Existing relational stores Other - Graph databases and custom sources Polyglot Persistence - New Storage Solutions
  • 20. Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Can be used for data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more. Versatile tool for product development Quick prototypes Risk reduction Early visibility and feedback from end users Supports agile methodology Interactive Notebooks
  • 21. Big Data - Capital Management Made for each other
  • 22. Change the existing systems to satisfy the minimum regulatory requirements. Not as low risk as legacy platforms are being stretched to their limits and breaking points. Limited capital management benefits Migrate to big data Meet regulatory requirements Platform for efficient capital management provides competitive advantages Cost benefits Decision
  • 23. Big Data - Challenges
  • 24. Rapidly changing, 3 generations of the Hadoop stack in 4/5 years Require a combination of skills and expertise - JVM, databases, functional programming, distributed systems HDFS is different than databases - immutable and distributed Data modelling expertise (Model around queries, data duplication is good) Functional Programming - Immutable data sets - Scala and Spark All this + Agile Mindset + Domain Expertise hard to find Why is Big Data is hard?
  • 25. Focus on solutions not technology. - Incentive seems to be to get a big data solution to market, rather than to meet strategic objectives or business benefits. This can lead to inadequate solutions that don’t hit the mark with users and won’t move the needle on business performance. Improper use of big data technology. The big Data stack is not suited for all problems. e.g. storing configuration a relational database is better than HBASE, multi -dimensional access is required a traditional warehouse e.g. OLAP Cube or Teradata is a better fit. Focus on big data technology, not the best of breed technology choice. In many cases there are existing open source components that are better than components branded “Big Data”. e.g. for workflow there are better choices than Oozie Vendor dependence - vendors often propose a solution that encourages lock-in to their distribution, rather than the most appropriate solution for the problem in hand. Skill - Very few people have the mix of skills required, solutions focus and domain expertise. Skills need to be built up by having real world experience on projects in production. Common causes of Big Data project failures
  • 26. Lots of Execution Risks Specification Risk - Are the users asking what they want? People Risk - Can you get the right people at the right time? Skills Risk - Do they have the right mix of skills? Technical Risk - Are the right components selected? Integration Risk - Will the components work together? Opportunity Cost - Failure to get to market.
  • 29. Agile - Best Governance Model Early and continuous delivery of valuable software. Working software is the primary measure of progress Cross functional teams Frequent communication Technical excellence and good design enhances agility Simplicity--the art of maximizing the amount of work not done--is essential The best architectures, requirements, and designs emerge from self- organizing teams Retrospectives - feedback loop, continuous improvement.
  • 30. Focussed Simple Stack Introduce a new technology only if it adds business benefit Risk vs. Value judgement Deep expertise helps, reduces people risk
  • 31. Start with uses cases where Big Data Stack has a greater competitive advantages Large data sets at rest Rich data types and evolvable schema Colocated data + compute PnL Store, Scenario set (market data) stores PnL Aggregations (Credit + Market Risk) Stress Testing Portfolio Calculations (IRC, Standardized Charge) Warehouse off boarding
  • 32. Use Market Driven Technology Selection Github commits Mailing list traffic Number of contributors Job Postings Google keyword searches Real world adoption vs vendor developed product
  • 33. Use best solution fit for purpose Just because a technology is part of the big data stack it is good Evaluate the big data technologies as you do for any other technology choice pick and choose the best from the options available not just best big data solution fit for purpose e.g. HBASE
  • 34. Use big data to test hypothesis Run full scale tests quickly Reverse engineer quickly rather than analyse all paths upfront
  • 35. Recommendations Be strictly agile Reduced time to market for delivery risk management Driven by business benefit Simplify Stack Spark + Scala = Big Data DSL Add abstractions later Use market information to evaluate technologies Leverage big data competitive advantages
  • 36. Scala + Spark = Big Data DSL Scala + Spark + HDFS Interfaces concise, versatile (ETL, aggregation, compute, pluggable storage, query, caching,, etc.) Standards in Open source/Hadoop ecosystem Extensible Enables agility
  • 37. Defining a MVP is hard to get right
  • 38. Thank You •Rate my session •Feel free to reach out •Ready to go risk analytics