SlideShare a Scribd company logo
REAL TIME FRAUDULENT WEB BEHAVIOR DETECTION
Jeff Niemann
Randal Hanak
2
Jeff Niemann
● Analyzing behavior at NeuroID for
4 years
● Focus on capturing behaviors
indicative of fraud
● Working with JavaScript and
Apache Flink
3
Randy Hanak
● Been with NeuroID for 1 year.
● Focusing on reliability,
performance, and scaling our
Flink pipeline as well as
downstream consumers.
● Also deployment and
developer experience in Flink
repositories.
Agenda
About NeuroId
Behavior Detection in Flink
High Level Architecture
Challenges and Solutions
Demo of Id Orchestrator
Large Scale Real Time Fraudulent Web Behavior Detection
The Digital
Identity Crisis
is the defining challenge of
modern financial services.
4.5M fake users
Avis, Hertz refuse Chime payments
Selectively bans bank transfers
Post-submit data is causing the crisis
Critical identity decisions are all made post-submit
Pre-submit data enhances all other data
Maximize the identity investments you’ve already made using pre-submit
Architecture Diagram
10
Web Page Behaviors
Genuine
- Know their personal information
Risky (Fraudulent)
- Don’t know personal information
Bots
- Rapidly filling form
High risk identity
1. Importing first & last name
into First Name field
1. Cutting last name out of First Name
field (Fraudster efficiency trick)
1. Form out-of-order – Entering in
the order the stolen data is stored
vs the form order
1. Navigating off application & back on
to SSN field (looking up info)
1. Hesitation throughout street address
entry – utilizing Short-Term Memory
(looking up info in chunks)
Risky Behaviors
Confidential & Proprietary Limited Distribution
Genuine identity
Confidential & Proprietary Limited Distribution
1. Navigating the form with focus
1. No mistakes entering
personal information
1. Personal details entered via
Long-Term Memory - no pauses
or hesitation
Genuine Behaviors
13
Bots/automated activity - what are they?
Automation of the onboarding process
● Automatic scripts that run through the form entering data
Why would you use a bot?
● For making accounts
● For prefill style attacks (insurance) - mining data off the page
14
What do they look like behaviorally?
Unusual/headless browsers
Fast typing (like super fast - 1000x faster than human)
Consistent typing
Fast transitions
15
Behaviors
16
Flink Keyed Processors
Confidential & Proprietary Limited Distribution
Flink Keyed Processors
Purpose: Group and annotate behaviors that are
indicative of fraud so that analytical tools can
quickly generate an outcome.
Rate Limiting Events
● Cleaning data as early as possible
● Expiring state on an interval
Aggregating Behavior
● Track types of fields interacted with
● Compare click events throughout session
● Group of events that represent a behavior
Database Sink
Send grouped events to a database
We will further discuss challenges and solutions in the next
session about scaling and speed.
Scaling and Speed
Confidential & Proprietary Limited Distribution
Scaling and Speed
● Score generated
within 3 seconds
of first interaction
on page
● Score continually
updated with
interactions
Scaling and Speed
● 161 million events processed per day.
● Events aggregated to sessions.
● Sessions processed in last 15 mins.
Scaling and Speed
How we scale downstream consumers.
● Kinesis trigger with batch
● S3 with ObjectCreated event
● Asynchronous invocation from flink to get required
concurrency.
Scaling and Speed
Scaling and Speed
Kinesis consumer
● Record size. 1MB
● Kinesis with Standard Iterator. Lambda service polls each shard in your
stream one time per second for records using HTTP protocol. With batch
window 0, can achieve 200-millisecond data retrieval latency for one
consumer. Given one consumer can can read up to 5 times per second per
shard.
● Dedicated-throughput consumer with enhanced fan out that can achieve ~
70ms of latency. Stream consumers use HTTP/2 to push records to
Lambda over a long-lived connection.
Scaling and Speed
S3 consumer
● Record size not a problem.
● Downstream consumers are triggered by ObjectCreated event
which is an asynchronous invoke on Lambda allowing easy
scaling.
● Downside is for latency considerations you would want all
relevant data to be packed into one S3 file which requires
accumulation of that data in state with Flink.
Accumulating session in state
Summary: What Worked Well?
● Size of our messages.
● Messages are small and don’t require context of
previous messages in session the decision is a
bit easier. Could use Kinesis or S3.
● Messages require context of the session.
Scaling and Speed
Database consumer
● Record size 200KB.
● Downstream consumer can be triggered by flink with an asynchronous invoke
using the lambda api.
● Consumers than will be given an id and a range of items to read from
database.
● We partition our data if it’s greater than 200KB into items in a database table.
Allows us to read backwards for a session and grab relevant records quickly in
the lambda.
Scaling and Speed
Summary
● Pick your transport by requirements on latency, size of messages, and
required concurrency.
● Use a async keyed process function to store records to database.
● Invoke the lambda from flink. Scaling it to reduce latency.
Monitoring and Alerting
Confidential & Proprietary Limited Distribution
Monitoring and Alerting
Identifying issues
Monitoring and Alerting
Issues resolved
Confidential & Proprietary Limited Distribution
Monitoring Size
Blue Green Deployments
Confidential & Proprietary Limited Distribution
Blue/Green Deployments
Deploying new
versions of our Flink
processors without
interrupting current
sessions.
Blue/Green Deployments
Additional details
● Properly keeping the mapping of clients to streams and
providing a default.
● Keeping latency low while maintaining this mapping.
● Metrics for finding any bugs, found issues that only
showed up during a traffic switch.
Final Learnings
Confidential & Proprietary Limited Distribution
What worked well
● Transition from kryo to pojo serializer
What worked well
● Avoid uncompressed records being transferred over the network
● Ensure functions end up in the same operator group. This way
records can be passed efficiently in between them. Partition key and
parallelism are some of the requirements to be in the same group.
PyFlink Learnings
● Initially worked with PyFlink because we were
comfortable with Python
● PyFlink wasn't a good choice for performance
reasons
● Also deployment using python with KDA wasn't
an option
Demo
Confidential & Proprietary Limited Distribution
Thank you
neuro-id.com

More Related Content

More from Flink Forward (20)

PPTX
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
PDF
Flink powered stream processing platform at Pinterest
Flink Forward
 
PPTX
Apache Flink in the Cloud-Native Era
Flink Forward
 
PPTX
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
PPTX
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
PPTX
The Current State of Table API in 2022
Flink Forward
 
PDF
Flink SQL on Pulsar made easy
Flink Forward
 
PPTX
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward
 
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
PPTX
Processing Semantically-Ordered Streams in Financial Services
Flink Forward
 
PDF
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 
PDF
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
PPTX
Welcome to the Flink Community!
Flink Forward
 
PPTX
Practical learnings from running thousands of Flink jobs
Flink Forward
 
PPTX
Extending Flink SQL for stream processing use cases
Flink Forward
 
PPTX
The top 3 challenges running multi-tenant Flink at scale
Flink Forward
 
PPTX
Using Queryable State for Fun and Profit
Flink Forward
 
PDF
Changelog Stream Processing with Apache Flink
Flink Forward
 
PPTX
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Flink Forward
 
PPTX
Building Reliable Lakehouses with Apache Flink and Delta Lake
Flink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
Flink powered stream processing platform at Pinterest
Flink Forward
 
Apache Flink in the Cloud-Native Era
Flink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
The Current State of Table API in 2022
Flink Forward
 
Flink SQL on Pulsar made easy
Flink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Flink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
Welcome to the Flink Community!
Flink Forward
 
Practical learnings from running thousands of Flink jobs
Flink Forward
 
Extending Flink SQL for stream processing use cases
Flink Forward
 
The top 3 challenges running multi-tenant Flink at scale
Flink Forward
 
Using Queryable State for Fun and Profit
Flink Forward
 
Changelog Stream Processing with Apache Flink
Flink Forward
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Flink Forward
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Flink Forward
 

Large Scale Real Time Fraudulent Web Behavior Detection

  • 1. REAL TIME FRAUDULENT WEB BEHAVIOR DETECTION Jeff Niemann Randal Hanak
  • 2. 2 Jeff Niemann ● Analyzing behavior at NeuroID for 4 years ● Focus on capturing behaviors indicative of fraud ● Working with JavaScript and Apache Flink
  • 3. 3 Randy Hanak ● Been with NeuroID for 1 year. ● Focusing on reliability, performance, and scaling our Flink pipeline as well as downstream consumers. ● Also deployment and developer experience in Flink repositories.
  • 4. Agenda About NeuroId Behavior Detection in Flink High Level Architecture Challenges and Solutions Demo of Id Orchestrator
  • 6. The Digital Identity Crisis is the defining challenge of modern financial services. 4.5M fake users Avis, Hertz refuse Chime payments Selectively bans bank transfers
  • 7. Post-submit data is causing the crisis Critical identity decisions are all made post-submit
  • 8. Pre-submit data enhances all other data Maximize the identity investments you’ve already made using pre-submit
  • 10. 10 Web Page Behaviors Genuine - Know their personal information Risky (Fraudulent) - Don’t know personal information Bots - Rapidly filling form
  • 11. High risk identity 1. Importing first & last name into First Name field 1. Cutting last name out of First Name field (Fraudster efficiency trick) 1. Form out-of-order – Entering in the order the stolen data is stored vs the form order 1. Navigating off application & back on to SSN field (looking up info) 1. Hesitation throughout street address entry – utilizing Short-Term Memory (looking up info in chunks) Risky Behaviors Confidential & Proprietary Limited Distribution
  • 12. Genuine identity Confidential & Proprietary Limited Distribution 1. Navigating the form with focus 1. No mistakes entering personal information 1. Personal details entered via Long-Term Memory - no pauses or hesitation Genuine Behaviors
  • 13. 13 Bots/automated activity - what are they? Automation of the onboarding process ● Automatic scripts that run through the form entering data Why would you use a bot? ● For making accounts ● For prefill style attacks (insurance) - mining data off the page
  • 14. 14 What do they look like behaviorally? Unusual/headless browsers Fast typing (like super fast - 1000x faster than human) Consistent typing Fast transitions
  • 16. 16
  • 17. Flink Keyed Processors Confidential & Proprietary Limited Distribution
  • 18. Flink Keyed Processors Purpose: Group and annotate behaviors that are indicative of fraud so that analytical tools can quickly generate an outcome.
  • 19. Rate Limiting Events ● Cleaning data as early as possible ● Expiring state on an interval
  • 20. Aggregating Behavior ● Track types of fields interacted with ● Compare click events throughout session ● Group of events that represent a behavior
  • 21. Database Sink Send grouped events to a database We will further discuss challenges and solutions in the next session about scaling and speed.
  • 22. Scaling and Speed Confidential & Proprietary Limited Distribution
  • 23. Scaling and Speed ● Score generated within 3 seconds of first interaction on page ● Score continually updated with interactions
  • 24. Scaling and Speed ● 161 million events processed per day. ● Events aggregated to sessions. ● Sessions processed in last 15 mins.
  • 25. Scaling and Speed How we scale downstream consumers. ● Kinesis trigger with batch ● S3 with ObjectCreated event ● Asynchronous invocation from flink to get required concurrency.
  • 27. Scaling and Speed Kinesis consumer ● Record size. 1MB ● Kinesis with Standard Iterator. Lambda service polls each shard in your stream one time per second for records using HTTP protocol. With batch window 0, can achieve 200-millisecond data retrieval latency for one consumer. Given one consumer can can read up to 5 times per second per shard. ● Dedicated-throughput consumer with enhanced fan out that can achieve ~ 70ms of latency. Stream consumers use HTTP/2 to push records to Lambda over a long-lived connection.
  • 28. Scaling and Speed S3 consumer ● Record size not a problem. ● Downstream consumers are triggered by ObjectCreated event which is an asynchronous invoke on Lambda allowing easy scaling. ● Downside is for latency considerations you would want all relevant data to be packed into one S3 file which requires accumulation of that data in state with Flink.
  • 29. Accumulating session in state Summary: What Worked Well? ● Size of our messages. ● Messages are small and don’t require context of previous messages in session the decision is a bit easier. Could use Kinesis or S3. ● Messages require context of the session.
  • 30. Scaling and Speed Database consumer ● Record size 200KB. ● Downstream consumer can be triggered by flink with an asynchronous invoke using the lambda api. ● Consumers than will be given an id and a range of items to read from database. ● We partition our data if it’s greater than 200KB into items in a database table. Allows us to read backwards for a session and grab relevant records quickly in the lambda.
  • 31. Scaling and Speed Summary ● Pick your transport by requirements on latency, size of messages, and required concurrency. ● Use a async keyed process function to store records to database. ● Invoke the lambda from flink. Scaling it to reduce latency.
  • 32. Monitoring and Alerting Confidential & Proprietary Limited Distribution
  • 35. Confidential & Proprietary Limited Distribution Monitoring Size
  • 36. Blue Green Deployments Confidential & Proprietary Limited Distribution
  • 37. Blue/Green Deployments Deploying new versions of our Flink processors without interrupting current sessions.
  • 38. Blue/Green Deployments Additional details ● Properly keeping the mapping of clients to streams and providing a default. ● Keeping latency low while maintaining this mapping. ● Metrics for finding any bugs, found issues that only showed up during a traffic switch.
  • 39. Final Learnings Confidential & Proprietary Limited Distribution
  • 40. What worked well ● Transition from kryo to pojo serializer
  • 41. What worked well ● Avoid uncompressed records being transferred over the network ● Ensure functions end up in the same operator group. This way records can be passed efficiently in between them. Partition key and parallelism are some of the requirements to be in the same group.
  • 42. PyFlink Learnings ● Initially worked with PyFlink because we were comfortable with Python ● PyFlink wasn't a good choice for performance reasons ● Also deployment using python with KDA wasn't an option
  • 43. Demo Confidential & Proprietary Limited Distribution

Editor's Notes

  • #2: Jeff and Randy to introduce themselves
  • #3: Randy Business logic abuse, vulnerability probing
  • #4: Randy Business logic abuse, vulnerability probing
  • #5: Randy to go over Agenda
  • #6: The industry-redefining behavioral analytics company applies patented neuroscience technology to measure how familiar users are with their inputted PII before they click ‘submit’ and enter a company’s fraud stack. NeuroID analyzes this pre-submit data in real-time and determines if users are genuine or risky, without adding any friction.
  • #7: Jeff What do we do at Neuro Id? Additional stats: identity theft was a $721 billion in 2021 Citation: https://aite-novarica.com/report/us-identity-theft-stark-reality Change this slide to be more focused around our public customers
  • #8: Jeff SLIDE PURPOSE: Most applications make decisions on post submit data
  • #9: Jeff SLIDE PURPOSE: We now can add presubmit behavioral data to help funnel applicants. WHAT behaviors are we trying to detect? Risky Applicants Genuine Applicants Bots
  • #10: Jeff SLIDE PURPOSE: Details components in our architecture Highlight peripheral services around Flink. Very Brief! Remake diagram to be v3 diagrams
  • #11: Randy Business logic abuse, vulnerability probing
  • #12: Jeff SLIDE PURPOSE: Visually conceptualize how Neuro-ID identifies a fraudulent applicant. Now that you have a concept of genuine behaviors, let's watch this session and call out any bad behaviors
  • #13: Jeff SLIDE PURPOSE: Visually conceptualize how Neuro-ID identifies a genuine applicant. Now that you have the concept for the types of data we collect - I wanted to show an example of genuine behaviors For example, when I give my address: xxxxxxx, I can easily do it fluently, without any pauses or hesitation, using my long term memory When I type that info, I’m using that same muscle memory to type just as I spoke – fluently, without hesitation
  • #14: Randy Business logic abuse, vulnerability probing
  • #15: Randy
  • #16: Randy
  • #17: Randy Highlight here is that they filled out 5 fields in 3 seconds while mousing in between the fields!! That’s nuts!
  • #18: Jeff SLIDE PURPOSE: Visually conceptualize how Neuro-ID identifies a genuine applicant. Now that you have the concept for the types of data we collect - I wanted to show an example of genuine behaviors For example, when I give my address: xxxxxxx, I can easily do it fluently, without any pauses or hesitation, using my long term memory When I type that info, I’m using that same muscle memory to type just as I spoke – fluently, without hesitation
  • #19: Jeff SLIDE PURPOSE: More details in how we utilize Flink Stick to high level for services Deaggreagator - break event packets down Rate Limiter - Limit certain events to human speed Session metadata tracker - Track high level details about session Target Type Annotator - Track types of fields interacted with Click Annotator - compare click events throughout session Segmenter - group of events that represent a behavior DynamoDb Sink - send grouped events to be scored.
  • #20: Jeff SLIDE PURPOSE: Discuss challenge of deploying changes to pipeline with active streams. Possibly break out to sink slide
  • #21: Jeff SLIDE PURPOSE: Discuss challenge of deploying changes to pipeline with active streams. Possibly break out to sink slide
  • #22: Jeff SLIDE PURPOSE: Discuss challenge of deploying changes to pipeline with active streams. Possibly break out to sink slide
  • #23: Jeff SLIDE PURPOSE: Visually conceptualize how Neuro-ID identifies a genuine applicant. Now that you have the concept for the types of data we collect - I wanted to show an example of genuine behaviors For example, when I give my address: xxxxxxx, I can easily do it fluently, without any pauses or hesitation, using my long term memory When I type that info, I’m using that same muscle memory to type just as I spoke – fluently, without hesitation
  • #24: Jeff SLIDE PURPOSE: Discuss challenge of making pipeline very fast Metrics around timing Need more chart / numbers
  • #25: Jeff SLIDE PURPOSE: Discuss challenge of making pipeline very fast Add metrics around timing (Flink / Computing) Scaling dynamically with sessions
  • #26: Randy SLIDE PURPOSE: Discuss challenge of making pipeline very fast Add metrics around timing (Flink / Computing) Scaling dynamically with sessions Kinesis with Standard Iterator. Lambda service polls each shard in your stream one time per second for records using HTTP protocol. With batch window 0, can achieve 200-millisecond data retrieval latency for one consumer. Given one consumer can can read up to 5 times per second per shard. Dedicated-throughput consumer with enhanced fan out that can achieve ~ 70ms of latency. Stream consumers use HTTP/2 to push records to Lambda over a long-lived connection.
  • #27: Randy SLIDE PURPOSE: Discuss challenge of making pipeline very fast Add metrics around timing (Flink / Computing) Scaling dynamically with sessions
  • #28: Randy S3/Dynamo/redis to reduce costs in the size of our cluster Favored costs over accumulation in state.
  • #29: Randy
  • #30: Randy SLIDE PURPOSE: Discuss challenge of deploying changes to pipeline with active streams. Possibly break out to sink slide
  • #31: Randy
  • #32: Randy SLIDE PURPOSE: Discuss challenge of making pipeline very fast Add metrics around timing (Flink / Computing) Scaling dynamically with sessions
  • #33: Jeff SLIDE PURPOSE: Visually conceptualize how Neuro-ID identifies a genuine applicant. Now that you have the concept for the types of data we collect - I wanted to show an example of genuine behaviors For example, when I give my address: xxxxxxx, I can easily do it fluently, without any pauses or hesitation, using my long term memory When I type that info, I’m using that same muscle memory to type just as I spoke – fluently, without hesitation
  • #34: Jeff Found backpressure in certain keyed process functions. Allowed us to iterate to make them work smoother
  • #35: Jeff Found backpressure in certain keyed process functions. Allowed us to iterate to make them work smoother
  • #36: Jeff SLIDE PURPOSE: Discuss how we monitor size of sessions.
  • #37: Jeff SLIDE PURPOSE: Visually conceptualize how Neuro-ID identifies a genuine applicant. Now that you have the concept for the types of data we collect - I wanted to show an example of genuine behaviors For example, when I give my address: xxxxxxx, I can easily do it fluently, without any pauses or hesitation, using my long term memory When I type that info, I’m using that same muscle memory to type just as I spoke – fluently, without hesitation
  • #38: Randy SLIDE PURPOSE: Discuss challenge of deploying changes to pipeline with active streams. Possibly break out to sink slide
  • #39: Randy SLIDE PURPOSE: Discuss challenge of deploying changes to pipeline with active streams. Possibly break out to sink slide
  • #40: Jeff SLIDE PURPOSE: Visually conceptualize how Neuro-ID identifies a genuine applicant. Now that you have the concept for the types of data we collect - I wanted to show an example of genuine behaviors For example, when I give my address: xxxxxxx, I can easily do it fluently, without any pauses or hesitation, using my long term memory When I type that info, I’m using that same muscle memory to type just as I spoke – fluently, without hesitation
  • #41: Randy SLIDE PURPOSE: Discuss challenge of deploying changes to pipeline with active streams. Diagram show not tell
  • #42: Randy SLIDE PURPOSE: Discuss challenge of deploying changes to pipeline with active streams. Diagram show not tell
  • #43: Jeff SLIDE PURPOSE: Discuss challenge of deploying changes to pipeline with active streams. Extra slide
  • #44: Jeff and Randy to introduce Demo
  • #45: Additional stats: identity theft was a $721 billion in 2021 Citation: https://aite-novarica.com/report/us-identity-theft-stark-reality