SlideShare a Scribd company logo
Using Machine Learning in Anti Money Laundering – Part 1
Background
Machine Learning is being used or experimented in all sorts of areas. Financial institutions are (or
looking to) leverage machine learning (and Artificial Intelligence) to improve how they run their
business.
In my desire to learn and understand machine learning, I decided to use an AML use case to see how
machine learning can be applied to a real business scenario. The AML activities consist of Know Your
Customer, Customer Due Diligence, Transaction Monitoring, SAR filing, Sanctions Screening, etc.
Customer Risk Rating
During Customer Due Diligence, financial institutions do customer risk assessment to determine the
overall risk rating of a customer. This is typically done by the risk rating methodology defined by the
Compliance group. A customer is assessed against several risk factors and given a score. Based on the
score calculated the customer is assigned a risk rating. The various risk factors are broadly in Geography
risk, Industry risk, Product risk, Channel risk, Relationship risk, Political risk, etc.
The customer risk rating is determined using a rules-based score and one could argue that this is not an
ideal candidate for machine learning use case. However precisely for this reason, I want to use this
because I can look at various machine learning models and determine how accurate these models are.
I have used a customer risk rating model using a limited number of risk factors. The risk factors that I
have used are:
1. Politically Exposed Person
2. Country of Residence
3. Length of Relationship
4. Number of Products
5. Net worth
6. Primary Product
Based on these risk factors, a risk score is calculated and the customer classified into Low, Medium or
High risk customer.
Algorithms
Before I get into the machine learning experiments, I want to thank Microsoft for making Azure Machine
Learning Studio available for learning. I also want to thank edX.org for the machine learning classes that
are made available on edX.org.
There are many machine learning algorithms available and I am going to experiment (this is still work in
progress and my experiments will continue) with following broad categories of algorithms:
- Classification
- Regression
- Clustering
Classification is supervised learning that is used to predict a category. In this case the category is the
customer risk classification. There are three risk classifications – Low, Medium and High. And due to
more than two categories, I used multi-class classification models.
Regression algorithms are used when a value is being predicted. In my learning, I will predict the risk
score and then use the risk score to risk rate a customer.
Clustering is a non-supervised learning algorithm that is used to segment data into similar clusters. To be
done after classification and regression experiments.
Preparing Data
Preparing data to train machine learning models consumes a lot of time and since I created the data,
there was really no data quality, munging or cleansing work done. However, I had to do some data prep
work before I could start on my experiments. The data work that I
did was:
- Remove one of the columns that I am not going to use
- Set the datatype of
o IsPEP, Residence Country, Primary Product and Risk
Class to String
o Relationship Length, Number of products and
Networth to Integer
o Risk score to float.
- Set IsPEP, Residence Country, Primary Product to
Categorical variables
- Set IsPEP, Residence Country, Relationship Length, Number
of Products, Networth, Primary Product, Risk Score as
Features
- Set Risk Class as label
- Normalized Relationship Length using MinMax
transformation for values between 0 and 1
- Normalized Networth using ZScore transformation
- Risk Score was not normalized
A quick note on feature and label. Features are the fields that are used in the machine learning
algorithms to predict. Label is the target variable that is to be predicted.
More on the classification experiments in Part 2.
Sundries
The data that I am using is dummy data. I have created this data based on my experience and reflects
real life scenarios. E.g. If a customer is PEP, that customer is all likelihood would be classified as High
risk.
The experiments done and the outcomes documented are my personal views and don’t reflect views of
any organization.

More Related Content

PPTX
Simplify Security with Ivanti Security Controls
Ivanti
 
PPTX
Face Detection
Amr Sheta
 
PPTX
Splunk for Enterprise Security Featuring UBA
Splunk
 
PDF
OWASP Top 10 Web Application Vulnerabilities
Software Guru
 
PPTX
Vulnerability and Assessment Penetration Testing
Yvonne Marambanyika
 
PDF
ATT&CKing the Sentinel – deploying a threat hunting capability on Azure Senti...
CloudVillage
 
PPTX
NIST Critical Security Framework (CSF)
Priyanka Aash
 
PPTX
Machine Learning Project
Abhishek Singh
 
Simplify Security with Ivanti Security Controls
Ivanti
 
Face Detection
Amr Sheta
 
Splunk for Enterprise Security Featuring UBA
Splunk
 
OWASP Top 10 Web Application Vulnerabilities
Software Guru
 
Vulnerability and Assessment Penetration Testing
Yvonne Marambanyika
 
ATT&CKing the Sentinel – deploying a threat hunting capability on Azure Senti...
CloudVillage
 
NIST Critical Security Framework (CSF)
Priyanka Aash
 
Machine Learning Project
Abhishek Singh
 

What's hot (20)

PDF
NIST Cybersecurity Framework Cross Reference
Jim Meyer
 
PPTX
Cybersecurity Audit
EC-Council
 
PPTX
What is Penetration Testing?
btpsec
 
PPTX
Cyber Threat Hunting Workshop
Digit Oktavianto
 
PPTX
A5: Security Misconfiguration
Tariq Islam
 
PDF
Data Science - Part III - EDA & Model Selection
Derek Kane
 
PDF
Threat Hunting
Splunk
 
PPTX
VAPT - Vulnerability Assessment & Penetration Testing
Netpluz Asia Pte Ltd
 
PPTX
網頁安全 Web security 入門 @ Study-Area
Orange Tsai
 
PPTX
Security misconfiguration
Micho Hayek
 
PPTX
SplunkLive! Splunk for Security
Splunk
 
PDF
Penetration testing & Ethical Hacking
S.E. CTS CERT-GOV-MD
 
PDF
NIST Cybersecurity Framework Intro for ISACA Richmond Chapter
Tuan Phan
 
PDF
Rothke secure360 building a security operations center (soc)
Ben Rothke
 
PDF
Tools and techniques for data science
Ajay Ohri
 
PPT
Introduction To Intrusion Detection Systems
Paul Green
 
PDF
Threat hunting 101 by Sandeep Singh
OWASP Delhi
 
PDF
Cyberark training pdf
Akhil Kumar
 
PDF
Finding attacks with these 6 events
Michael Gough
 
PDF
The Game of Bug Bounty Hunting - Money, Drama, Action and Fame
Abhinav Mishra
 
NIST Cybersecurity Framework Cross Reference
Jim Meyer
 
Cybersecurity Audit
EC-Council
 
What is Penetration Testing?
btpsec
 
Cyber Threat Hunting Workshop
Digit Oktavianto
 
A5: Security Misconfiguration
Tariq Islam
 
Data Science - Part III - EDA & Model Selection
Derek Kane
 
Threat Hunting
Splunk
 
VAPT - Vulnerability Assessment & Penetration Testing
Netpluz Asia Pte Ltd
 
網頁安全 Web security 入門 @ Study-Area
Orange Tsai
 
Security misconfiguration
Micho Hayek
 
SplunkLive! Splunk for Security
Splunk
 
Penetration testing & Ethical Hacking
S.E. CTS CERT-GOV-MD
 
NIST Cybersecurity Framework Intro for ISACA Richmond Chapter
Tuan Phan
 
Rothke secure360 building a security operations center (soc)
Ben Rothke
 
Tools and techniques for data science
Ajay Ohri
 
Introduction To Intrusion Detection Systems
Paul Green
 
Threat hunting 101 by Sandeep Singh
OWASP Delhi
 
Cyberark training pdf
Akhil Kumar
 
Finding attacks with these 6 events
Michael Gough
 
The Game of Bug Bounty Hunting - Money, Drama, Action and Fame
Abhinav Mishra
 
Ad

Similar to Using Machine Learning in Anti Money Laundering - Part 1 (20)

PDF
SURVEY ON SENTIMENT ANALYSIS
IRJET Journal
 
PDF
IRJET- Improving Performance of Fake Reviews Detection in Online Review’s usi...
IRJET Journal
 
PDF
ArrowMiner FAQs
dtsiolis
 
PDF
How ml can improve purchase conversions
Sudeep Shukla
 
PDF
Supervised learning techniques and applications
Benjaminlapid1
 
PDF
IRJET- Analysis of Brand Value Prediction based on Social Media Data
IRJET Journal
 
PDF
IRJET- Personality Prediction System using AI
IRJET Journal
 
PDF
IRJET- Fake Review Detection using Opinion Mining
IRJET Journal
 
PDF
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
IRJET Journal
 
PDF
IRJET- Survey of Classification of Business Reviews using Sentiment Analysis
IRJET Journal
 
PDF
Using machine learning in anti money laundering part 2
Naveen Grover
 
PDF
FAQ for the Predictive Testing of Opportunities
The Inovo Group
 
PDF
IRJET - Online Product Scoring based on Sentiment based Review Analysis
IRJET Journal
 
PDF
MACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISK
IRJET Journal
 
DOCX
CMGT 400 Entire Course NEW
shyamuopfive
 
DOCX
Cmgt 400 Entire Course NEW
shyamuop
 
PDF
Automobile Insurance Claim Fraud Detection using Random Forest and ADASYN
IRJET Journal
 
PDF
Enterprise 360 degree risk management
Infosys
 
DOCX
Term ProjectTotal Points 5Due date 05012018Select an e.docx
bradburgess22840
 
PDF
What we do; predictive and prescriptive analytics
Weibull AS
 
SURVEY ON SENTIMENT ANALYSIS
IRJET Journal
 
IRJET- Improving Performance of Fake Reviews Detection in Online Review’s usi...
IRJET Journal
 
ArrowMiner FAQs
dtsiolis
 
How ml can improve purchase conversions
Sudeep Shukla
 
Supervised learning techniques and applications
Benjaminlapid1
 
IRJET- Analysis of Brand Value Prediction based on Social Media Data
IRJET Journal
 
IRJET- Personality Prediction System using AI
IRJET Journal
 
IRJET- Fake Review Detection using Opinion Mining
IRJET Journal
 
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
IRJET Journal
 
IRJET- Survey of Classification of Business Reviews using Sentiment Analysis
IRJET Journal
 
Using machine learning in anti money laundering part 2
Naveen Grover
 
FAQ for the Predictive Testing of Opportunities
The Inovo Group
 
IRJET - Online Product Scoring based on Sentiment based Review Analysis
IRJET Journal
 
MACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISK
IRJET Journal
 
CMGT 400 Entire Course NEW
shyamuopfive
 
Cmgt 400 Entire Course NEW
shyamuop
 
Automobile Insurance Claim Fraud Detection using Random Forest and ADASYN
IRJET Journal
 
Enterprise 360 degree risk management
Infosys
 
Term ProjectTotal Points 5Due date 05012018Select an e.docx
bradburgess22840
 
What we do; predictive and prescriptive analytics
Weibull AS
 
Ad

Recently uploaded (20)

PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PDF
REPORT: Heating appliances market in Poland 2024
SPIUG
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PDF
Software Development Methodologies in 2025
KodekX
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
REPORT: Heating appliances market in Poland 2024
SPIUG
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
The Future of Artificial Intelligence (AI)
Mukul
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
Software Development Methodologies in 2025
KodekX
 

Using Machine Learning in Anti Money Laundering - Part 1

  • 1. Using Machine Learning in Anti Money Laundering – Part 1 Background Machine Learning is being used or experimented in all sorts of areas. Financial institutions are (or looking to) leverage machine learning (and Artificial Intelligence) to improve how they run their business. In my desire to learn and understand machine learning, I decided to use an AML use case to see how machine learning can be applied to a real business scenario. The AML activities consist of Know Your Customer, Customer Due Diligence, Transaction Monitoring, SAR filing, Sanctions Screening, etc. Customer Risk Rating During Customer Due Diligence, financial institutions do customer risk assessment to determine the overall risk rating of a customer. This is typically done by the risk rating methodology defined by the Compliance group. A customer is assessed against several risk factors and given a score. Based on the score calculated the customer is assigned a risk rating. The various risk factors are broadly in Geography risk, Industry risk, Product risk, Channel risk, Relationship risk, Political risk, etc. The customer risk rating is determined using a rules-based score and one could argue that this is not an ideal candidate for machine learning use case. However precisely for this reason, I want to use this because I can look at various machine learning models and determine how accurate these models are. I have used a customer risk rating model using a limited number of risk factors. The risk factors that I have used are: 1. Politically Exposed Person 2. Country of Residence 3. Length of Relationship 4. Number of Products 5. Net worth 6. Primary Product Based on these risk factors, a risk score is calculated and the customer classified into Low, Medium or High risk customer. Algorithms Before I get into the machine learning experiments, I want to thank Microsoft for making Azure Machine Learning Studio available for learning. I also want to thank edX.org for the machine learning classes that are made available on edX.org. There are many machine learning algorithms available and I am going to experiment (this is still work in progress and my experiments will continue) with following broad categories of algorithms: - Classification - Regression - Clustering Classification is supervised learning that is used to predict a category. In this case the category is the customer risk classification. There are three risk classifications – Low, Medium and High. And due to more than two categories, I used multi-class classification models.
  • 2. Regression algorithms are used when a value is being predicted. In my learning, I will predict the risk score and then use the risk score to risk rate a customer. Clustering is a non-supervised learning algorithm that is used to segment data into similar clusters. To be done after classification and regression experiments. Preparing Data Preparing data to train machine learning models consumes a lot of time and since I created the data, there was really no data quality, munging or cleansing work done. However, I had to do some data prep work before I could start on my experiments. The data work that I did was: - Remove one of the columns that I am not going to use - Set the datatype of o IsPEP, Residence Country, Primary Product and Risk Class to String o Relationship Length, Number of products and Networth to Integer o Risk score to float. - Set IsPEP, Residence Country, Primary Product to Categorical variables - Set IsPEP, Residence Country, Relationship Length, Number of Products, Networth, Primary Product, Risk Score as Features - Set Risk Class as label - Normalized Relationship Length using MinMax transformation for values between 0 and 1 - Normalized Networth using ZScore transformation - Risk Score was not normalized A quick note on feature and label. Features are the fields that are used in the machine learning algorithms to predict. Label is the target variable that is to be predicted. More on the classification experiments in Part 2. Sundries The data that I am using is dummy data. I have created this data based on my experience and reflects real life scenarios. E.g. If a customer is PEP, that customer is all likelihood would be classified as High risk. The experiments done and the outcomes documented are my personal views and don’t reflect views of any organization.