BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient Capital Management

Fast data is the key
to next-gen capital
and risk management
-By Deenar Toraskar
-Founder
-ThinkReactive

Agenda
Introduction
Business Background
Technology Impact
Big Data - Redux
Big Data - Relevant Technologies
Challenges
Recommendations

•Hands on
•Many years and many roles at an investment bank
•20 years of JVM experience
•10 years+ of risk expertise
•5 year of big data Hadoop, Hive, Spark, Scala
•Founder of Think Reactive - Risk analytics solutions
• FRTB Market and Credit Risk, Stress Testing, IFRS9 solutions
Me and Spark

Background
• Increased capital requirements
• More capital charges
• Capital cost is now a large part of
PnL
• Capital charges are complex
• Capital charges are portfolio level
• More scenarios are required

Increased Regulatory Measures
VaR (general + specific)
Stressed VaR
CRM
Floor
Standardized Charge
Incremental Risk Charge
Hypothetical Backtesting
CVA
WWR(Wrong Way Risk)
Stressed EPE
Liquidity Coverage Ratios
Net Stable Funding Ratio
Portfolio Stress Testing

Business Impact
•Regulatory capital costs are now a significant part of
trading profit and loss.
•The emphasis on risk management is changing and
growing.
•Risk functions are now integral part of trading and risk
has now moved from a back-office to a corner-office
function.
•Exposures are monitored proactively and risk measures
calculations are calculated continuously throughout the
trading day.
•In addition regulatory risk impact is now a critical part
of pre-trade decision making.

Technology Impact - Perfect Storm
• Volumes - More measures, more scenarios, more
granular, portfolio level calculations
• Variety - More frequent, more timely, near-real time
view required
• Variety - complex measures, trade level, more
reference and market data required, structured
products
• Veracity - BCBS lays down general principles for
management of risk data sets such as
completeness, traceability, accuracy, validation,
reconciliation and integrity

Current State - Suboptimal technologies
Disparate systems for risk calculation, distribution, aggregation and reporting
and market data using compute grids, in memory data fabrics, messaging
infrastructure, relational databases and shared file systems
Why - having to support granular mutable state makes both databases and
in-memory data grids hard to scale
Compute grids are great for distributing compute tasks, but often leave the
compute capacity underutilized as the data distribution infrastructure to
and from the compute grids does not keep up with demand.
Relational databases have limited support for semi-structured data and
complex data structures representing pricing (and risk) library inputs and
outputs. This often leads to only summary results being persisted resulting

Many moving parts
Custom formats - conversions, serialisation,
deserialisation multiple times
Many network hops - data being copied
multiple times
Databases used for storage - hard to scale
suboptimal, fixed schemas
Current State - continued

Traditional systems, and the data management techniques associated with
them, have failed to scale to Big Data.
To tackle the challenges of Big Data, a new breed of technologies has
emerged. Many of these new technologies have been grouped under the
term Big Data.
These systems can scale to vastly larger sets of data, but using these
technologies effectively requires a fundamentally new set of techniques.
They aren’t one-size-fits-all solutions.
Many of these Big Data systems were pioneered by Internet giants (Google,
Yahoo, Amazon, Facebook, LinkedIn, etc), including distributed
filesystems, the MapReduce computation framework, key-value store,
message buses and distributed locking services.
Big Data History

Specialised Systems
HDFS vs SAN/NFS
Cassandra vs Databases
Kafka vs MQ/Message Queues
In-Memory DataGrids, Spark vs IMDGs
Built to scale, partitioning/sharding out of the box
Resilient
Sharing
Cloud Friendly
Primarily open source
Big Data - How do they handle 4Vs

Google Protocol Buffers
Thrift
Avro
Native support for complex data types (arrays, vector,
matrices, maps, structures), evolvable schemas
Standard Interfaces viz Hive Serde
Out of the box tool support - viz. RPC, ETL, SQL on
Serialization Formats

Avro - Splittable, evolvable schema
Parquet/ORC - Columnar, High compression, predicate
pushdown, optimised for data warehouses
Support same rich types
File/Storage Formats

Input/Output Format
Serde (Hive Serialisation Deserialisation Interface)
Spark DataSources
Hive Metastore
Quicker time to market, increased productivity
Standard Interfaces and Interoperable
components

Scala + Spark = Big Data DSL
Scala + Spark + HDFS Interfaces
concise, versatile (ETL, aggregation, compute, pluggable
storage, query, caching,, etc.)
Standards in Open source/Hadoop ecosystem
Extensible
Enables agility

Machine learning
Natural language processing
Deep learning
Graph Analytics
Stream processing
Probabilistic Data Structures
New algorithms/patterns

Pick the best storage for your data sets, mix and match
HDFS - Large data sets at rest, colocated compute
No-SQL - Key value, mutable state, evolvable schemas
Kafka - high-throughput, distributed, publish-subscribe messaging system
Elastic Search/Solr - Text, semi-structured, search optimised
Existing relational stores
Other - Graph databases and custom sources
Polyglot Persistence - New Storage
Solutions

Notebook is a web application that allows you to create and share documents that
contain live code, equations, visualizations and explanatory text. Can be used for data
cleaning and transformation, numerical simulation, statistical modeling, machine learning
and much more.
Versatile tool for product development
Quick prototypes
Risk reduction
Early visibility and feedback from end users
Supports agile methodology
Interactive Notebooks

Big Data - Capital Management
Made for each other

Change the existing systems to satisfy the minimum
regulatory requirements.
Not as low risk as legacy platforms are being stretched to their limits and breaking points.
Limited capital management benefits
Migrate to big data
Meet regulatory requirements
Platform for efficient capital management provides competitive advantages
Cost benefits
Decision

Rapidly changing, 3 generations of the Hadoop stack in 4/5 years
Require a combination of skills and expertise - JVM, databases, functional
programming, distributed systems
HDFS is different than databases - immutable and distributed
Data modelling expertise (Model around queries, data duplication is good)
Functional Programming - Immutable data sets - Scala and Spark
All this + Agile Mindset + Domain Expertise hard to find
Why is Big Data is hard?

Focus on solutions not technology. - Incentive seems to be to get a big data solution to market,
rather than to meet strategic objectives or business benefits. This can lead to inadequate
solutions that don’t hit the mark with users and won’t move the needle on business
performance.
Improper use of big data technology. The big Data stack is not suited for all problems. e.g. storing
configuration a relational database is better than HBASE, multi -dimensional access is required
a traditional warehouse e.g. OLAP Cube or Teradata is a better fit.
Focus on big data technology, not the best of breed technology choice. In many cases there are
existing open source components that are better than components branded “Big Data”. e.g. for
workflow there are better choices than Oozie
Vendor dependence - vendors often propose a solution that encourages lock-in to their distribution,
rather than the most appropriate solution for the problem in hand.
Skill - Very few people have the mix of skills required, solutions focus and domain expertise. Skills
need to be built up by having real world experience on projects in production.
Common causes of Big Data project failures

Lots of Execution Risks
Specification Risk - Are the users asking what they want?
People Risk - Can you get the right people at the right time?
Skills Risk - Do they have the right mix of skills?
Technical Risk - Are the right components selected?
Integration Risk - Will the components work together?
Opportunity Cost - Failure to get to market.

Agile - Best Governance Model
Early and continuous delivery of valuable software. Working software is the
primary measure of progress
Cross functional teams
Frequent communication
Technical excellence and good design enhances agility
Simplicity--the art of maximizing the amount of work not done--is essential
The best architectures, requirements, and designs emerge from self-
organizing teams
Retrospectives - feedback loop, continuous improvement.

Focussed Simple Stack
Introduce a new technology only if it adds business benefit
Risk vs. Value judgement
Deep expertise helps, reduces people risk

Start with uses cases where Big Data
Stack has a greater competitive
advantages
Large data sets at rest
Rich data types and evolvable schema
Colocated data + compute
PnL Store, Scenario set (market data) stores
PnL Aggregations (Credit + Market Risk)
Stress Testing
Portfolio Calculations (IRC, Standardized Charge)
Warehouse off boarding

Use Market Driven Technology Selection
Github commits
Mailing list traffic
Number of contributors
Job Postings
Google keyword searches
Real world adoption vs vendor developed product

Use best solution fit for purpose
Just because a technology is part of the big data stack it is
good
Evaluate the big data technologies as you do for any other
technology choice
pick and choose the best from the options available not
just best big data solution fit for purpose
e.g. HBASE

Use big data to test hypothesis
Run full scale tests quickly
Reverse engineer quickly rather than analyse all paths
upfront

Recommendations
Be strictly agile
Reduced time to market for delivery risk management
Driven by business benefit
Simplify Stack
Spark + Scala = Big Data DSL
Add abstractions later
Use market information to evaluate technologies
Leverage big data competitive advantages

Defining a MVP is hard to get right

Thank You
•Rate my session
•Feel free to reach out
•Ready to go risk analytics

BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient Capital Management

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient Capital Management (20)

More from Big Data Week (10)

Recently uploaded (20)

BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient Capital Management