0% found this document useful (0 votes)

68 views22 pages

Online Fraud Detection

The document summarizes the key steps in an online fraud detection workflow using machine learning: 1. Data is explored and preprocessed, including handling missing values, outliers, and imbalanced classes. 2. Feature engineering is applied to encode categorical variables and select important predictive features. 3. A LightGBM gradient boosting model is trained on labeled transaction data to classify fraudulent and non-fraudulent transactions. 4. The model is evaluated on a test set using the ROC-AUC metric, achieving a score of 0.81 indicating good performance in distinguishing fraud.

Uploaded by

farahzayani82

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views22 pages

Online Fraud Detection

Uploaded by

farahzayani82

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 22

Online

Fraud Detection
Fighting financial crime with machine learning .
Intoduction

Everyone is exposed to financial fraud if

you're selling or buying something
onine providing financial services , you face
fraud risks every day .

For business scams especially because you're

not only losing money but also
customers who may no longer trust you .

So detecting and preventing Fraud is

essential.
Online Fraud

E-commerce Social media

Digital Online
advertising banking
Steps

01 02 03 04 05

Selection
Data Feature Model Model
&
Exploration Engineering Evaluation Testing
Training
&
Of the
Preprocessing
Model
01
Data Exploration
& Preprocessing

We first explored our datasets to gain insights into its structure and characteristics.

Our dataset contains 200000 transaction The dataset of fraud transactions contains 8640
records with 55 features transactions with 8 features
Visualization

We used various visualizations like

histograms, pie charts , and box plots
to understand the distribution and
relationships between different
features.
Distribution of Payment Type

Paypal 9.47% 90.5% Credit Card

0.0005 0.001
Direct Debit Inicis Payment
% %
Distribution of Card Type

 Visa is the most commonly

used card, followed by MasterCard
and American Express
Transaction Currency code

79.2% 5.3%
USD CAD

8.9% 6.6%
EUR GBP
Registreted accounts

 The number of unregistreted user

account is higher than registreted user
account

-----> Transactions from unregistered users

might be considered higher risk
• Fraudsters often use
proxy IPs to obfuscate their
true location

-----> There are 1092

suspecious transactions
originating from Washington
where potentian fraud risk .

• The presence of a large number of transactions

from these 30 states may raise suspicions about the
legitimacy of those transactions
Transaction Hours distribution

 The most of transactions

are happened between the 10
and 20 hour of the day which
corresponds to 10:00 AM –
08:00 PM
Tag the data : Labeling Our Data for Supervised Learning

In our fraud detection project, the target variable is the 'Label' , which indicates whether a
transaction is fraudulent or not

• 0 : Non fraudulent transaction 1: Fraudulent transaction

 We observe that the tagged data is

imbalanced . There is fewer instances of
positive class compared to the negative
class
Handling missing values

o The dataset had missing values in certain features which can affect the quality and accuracy of
analysis

Numerical values Categorical values

 Using Multiple Imputation by  Using the mode of the column

Chained Equations MICE which is an
iterative imputation method Replace missing values with the most
frequent category in each categorical column
It uses observed values from other variables to
estimate missing values
Outliers Detection & Handling
o One widely used method for identifying and handling outliers is the "Winsorizing"

------> It's a data transformation technique that involves capping extreme values in a dataset at a
specified percentile
------> It preserves the distributional characteristics of the original data while reducing the effect
of outliers

After

Handling
Outliers
02
Feature
Engineering

Feature engineering is a critical step to enhance the model's predictive power

Encoding Categorical
Variables
Machine learning models typically expect
numerical inputs, so categorical variables need to
be encoded into a numeric representation before Label Encoding
feeding them to the model
Each category is mapped to an
integer value
Point-biserial correlation

 The correlation
between the binary
target variable and the
continuous features
03
Selection & Training
of the model

Split the data

 The training set is used to train the

model 80 %
 The test set is used to evaluate the
model's performance on unseen data.the 80% 20%
model makes predictions on this set
20%

The target variable is 'Label' : the variable we want to predict

Train the model

Using LightGBM :
For this classification task , lightGBM is an efficient and
powerful open-source machine learning framework
specifically designed for gradient boosting .It combines
speed, memory efficiency, and accuracy

During the training process, lightGBM

calculates the importance
of each feature based on how much
it contributes to the model's
accuracy.
04
Model Evaluation

ROC-AUC: Receiver Operating Characteristic - Area Under the Curve

It's a performance metric used to evaluate the performance of binary classification

models

 It measures the area under

the ROC curve, which is a
graphical representation of
the model's true positive rate
against the false positive rate
at different classification
thresholds
The ROC-AUC score is a useful metric for evaluating classifiers, especially in imbalanced
datasets where accuracy alone can be misleading

The score is 0.81 : suggests that it model is

performing well in distinguishing between fraud
and non-fraud transactions .
Thank yo for your attention !

Fraud Detection for ML Engineers
No ratings yet
Fraud Detection for ML Engineers
15 pages
MINIPROJECT2
No ratings yet
MINIPROJECT2
10 pages
Credit Card Fraud Detection Guide
No ratings yet
Credit Card Fraud Detection Guide
17 pages
Fraud Detection in Financial Transactions - PPT.PPTX - 20240805 - 175608 - 0000
No ratings yet
Fraud Detection in Financial Transactions - PPT.PPTX - 20240805 - 175608 - 0000
22 pages
Phase-2 For DS
No ratings yet
Phase-2 For DS
13 pages
Credit Card Fraud Detection - Final
No ratings yet
Credit Card Fraud Detection - Final
3 pages
Module 3.4 Classification Models, Case Study
No ratings yet
Module 3.4 Classification Models, Case Study
12 pages
Upi Fraud Detection Using Machine Learning
No ratings yet
Upi Fraud Detection Using Machine Learning
11 pages
Fraud Detection Synopsis
No ratings yet
Fraud Detection Synopsis
14 pages
PPT Dự án cuối kỳ nhóm 8
No ratings yet
PPT Dự án cuối kỳ nhóm 8
38 pages
Major Project Report
No ratings yet
Major Project Report
11 pages
Fraud Detection with Machine Learning
No ratings yet
Fraud Detection with Machine Learning
30 pages
Machine Learning Report
No ratings yet
Machine Learning Report
5 pages
Report
No ratings yet
Report
14 pages
Contribution TitleSupported by Organization X 1
No ratings yet
Contribution TitleSupported by Organization X 1
8 pages
Online Fraud Detection
No ratings yet
Online Fraud Detection
24 pages
AML in Banking: Machine Learning & Stats
No ratings yet
AML in Banking: Machine Learning & Stats
7 pages
Ads Phase4
No ratings yet
Ads Phase4
5 pages
Phase 2 New
No ratings yet
Phase 2 New
14 pages
Final Year Project
No ratings yet
Final Year Project
27 pages
Credit Card Fraud Detection and Analysis
No ratings yet
Credit Card Fraud Detection and Analysis
4 pages
Fraud Detection with Machine Learning
No ratings yet
Fraud Detection with Machine Learning
8 pages
Machine Learning for Bank Fraud Detection
No ratings yet
Machine Learning for Bank Fraud Detection
17 pages
Credit Card Fraud Detection Using Machine Learning
No ratings yet
Credit Card Fraud Detection Using Machine Learning
8 pages
House Price Prediction: Numpy Pandas Matplotlib Seaborn
No ratings yet
House Price Prediction: Numpy Pandas Matplotlib Seaborn
8 pages
Final Synopsis Fraud Detection
No ratings yet
Final Synopsis Fraud Detection
15 pages
Fraud Detection On Bankism Data
No ratings yet
Fraud Detection On Bankism Data
25 pages
ML Model Building for Beginners
No ratings yet
ML Model Building for Beginners
9 pages
Financial Fraud Detection
No ratings yet
Financial Fraud Detection
11 pages
Online Transactions Fraud Detection Using Machine Learning
No ratings yet
Online Transactions Fraud Detection Using Machine Learning
4 pages
E-Commerce Fraud Detection Using Machine Learning
No ratings yet
E-Commerce Fraud Detection Using Machine Learning
19 pages
Predicting Credit Card Approvals
100% (1)
Predicting Credit Card Approvals
14 pages
Credit Card Fraud Detection
No ratings yet
Credit Card Fraud Detection
4 pages
Learn 2
No ratings yet
Learn 2
10 pages
AI and DS Final Document For Phase 5
No ratings yet
AI and DS Final Document For Phase 5
9 pages
Mlproject
No ratings yet
Mlproject
8 pages
New Synopsis
No ratings yet
New Synopsis
18 pages
HR Template
No ratings yet
HR Template
6 pages
Untitled
No ratings yet
Untitled
14 pages
Fraud Detection in Banking Data by Machine Learning
No ratings yet
Fraud Detection in Banking Data by Machine Learning
11 pages
Unsupervised Learning For Credit Card Fraud Detection - A Case Study
No ratings yet
Unsupervised Learning For Credit Card Fraud Detection - A Case Study
3 pages
Machine Learning for Banking Fraud Detection
No ratings yet
Machine Learning for Banking Fraud Detection
10 pages
Intermediate Analytics-Regression-Week 3-1
No ratings yet
Intermediate Analytics-Regression-Week 3-1
44 pages
03 Niall Adams
100% (1)
03 Niall Adams
49 pages
Final Way
No ratings yet
Final Way
15 pages
Anti Fraud
No ratings yet
Anti Fraud
23 pages
Sibi 5
No ratings yet
Sibi 5
27 pages
Synopsis Major Project CreditCardFraudDetection
No ratings yet
Synopsis Major Project CreditCardFraudDetection
16 pages
Phase 3
No ratings yet
Phase 3
19 pages
B17 Discrete Report
No ratings yet
B17 Discrete Report
16 pages
Credit Card Fraud Detection
No ratings yet
Credit Card Fraud Detection
25 pages
Credit Card Fraud Detection Using Machine Learning
No ratings yet
Credit Card Fraud Detection Using Machine Learning
6 pages
Sample Phase 4
No ratings yet
Sample Phase 4
16 pages
ANN, KNN & Decision Tree
No ratings yet
ANN, KNN & Decision Tree
13 pages
Urtc45901.2018.9244782
No ratings yet
Urtc45901.2018.9244782
4 pages
Knuth 40188803 DBA
No ratings yet
Knuth 40188803 DBA
209 pages
Synthesizing Class Labels For Highly Imbalanced Credit Card Fraud Detection Data
No ratings yet
Synthesizing Class Labels For Highly Imbalanced Credit Card Fraud Detection Data
22 pages
UTS Assignment
No ratings yet
UTS Assignment
2 pages
Power Amplifier User Manual
No ratings yet
Power Amplifier User Manual
44 pages
Uster Quantum 4.0 Techdate Web en 23
No ratings yet
Uster Quantum 4.0 Techdate Web en 23
8 pages
Estimation of Aerodynamic Parameters Near Stall Using Maximum Likelihood and Extreme Learning Machine Based Methods
No ratings yet
Estimation of Aerodynamic Parameters Near Stall Using Maximum Likelihood and Extreme Learning Machine Based Methods
21 pages
24K Gold Cream Gel Formulation
No ratings yet
24K Gold Cream Gel Formulation
2 pages
Similarity Report
No ratings yet
Similarity Report
20 pages
Sheet - 01 - Thermochemistry - Removed
No ratings yet
Sheet - 01 - Thermochemistry - Removed
39 pages
Practical 3 Porosity Determination Gas Expansion
No ratings yet
Practical 3 Porosity Determination Gas Expansion
14 pages
Industrial Sliding Gate Solutions
No ratings yet
Industrial Sliding Gate Solutions
3 pages
NFS-3030 Operations Manual
No ratings yet
NFS-3030 Operations Manual
70 pages
Saqa 252026 Summative Assesement
No ratings yet
Saqa 252026 Summative Assesement
14 pages
Mpfi (Original) REPORT
No ratings yet
Mpfi (Original) REPORT
20 pages
Math 10 Learning Plan
100% (1)
Math 10 Learning Plan
11 pages
Online Helpdesk System Proposal
No ratings yet
Online Helpdesk System Proposal
49 pages
Binomial Identity Proof
No ratings yet
Binomial Identity Proof
1 page
Rc166 010d ModularityPatterns 0
No ratings yet
Rc166 010d ModularityPatterns 0
7 pages
Python Programming for Learners
No ratings yet
Python Programming for Learners
9 pages
The Filipino Character Strengths and Weaknesses
No ratings yet
The Filipino Character Strengths and Weaknesses
9 pages
Abe222 Compiled by Victor King
No ratings yet
Abe222 Compiled by Victor King
127 pages
Experiment No. 4 Common Emitter Amplifier
No ratings yet
Experiment No. 4 Common Emitter Amplifier
6 pages
Schedule Trial Mix FC 25
No ratings yet
Schedule Trial Mix FC 25
1 page
Homework Help for Students
100% (1)
Homework Help for Students
7 pages
Solar DCR Certificate for Compliance
No ratings yet
Solar DCR Certificate for Compliance
2 pages
Eti-Class Test
No ratings yet
Eti-Class Test
3 pages
Classification of PL
No ratings yet
Classification of PL
12 pages
HR Analytics - Google Case Study
No ratings yet
HR Analytics - Google Case Study
4 pages
Islam and The West - Prince Charles
No ratings yet
Islam and The West - Prince Charles
10 pages
Collision Concept Tests
No ratings yet
Collision Concept Tests
31 pages
Richard G. Erskine, Janet Moursund - The Art and Science of Relationship - The Practice of Integrative Psychotherapy-Phoenix Publishing House (2022)
100% (2)
Richard G. Erskine, Janet Moursund - The Art and Science of Relationship - The Practice of Integrative Psychotherapy-Phoenix Publishing House (2022)
308 pages
Arteche CT Agit en
No ratings yet
Arteche CT Agit en
12 pages

Online Fraud Detection

Uploaded by

Online Fraud Detection

Uploaded by

Online

Everyone is exposed to financial fraud if

For business scams especially because you're

So detecting and preventing Fraud is

E-commerce Social media

We used various visualizations like

Paypal 9.47% 90.5% Credit Card

 Visa is the most commonly

 The number of unregistreted user

-----> Transactions from unregistered users

-----> There are 1092

• The presence of a large number of transactions

 The most of transactions

• 0 : Non fraudulent transaction 1: Fraudulent transaction

 We observe that the tagged data is

Numerical values Categorical values

 Using Multiple Imputation by  Using the mode of the column

Feature engineering is a critical step to enhance the model's predictive power

Split the data

 The training set is used to train the

The target variable is 'Label' : the variable we want to predict

During the training process, lightGBM

ROC-AUC: Receiver Operating Characteristic - Area Under the Curve

It's a performance metric used to evaluate the performance of binary classification

 It measures the area under

The score is 0.81 : suggests that it model is

You might also like