Skip to content

Fraud Detection

Industry Department Role
Retail & Online Finance Business executive

Processed Dataset

Data

The datasets contains transactions made by credit cards in September 2013 by european cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions. The goal is to identify fraudulent credit card transactions. Feature 'Class' is the response variable and it takes value 1 in case of fraud and 0 otherwise.

Time V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 Amount Class
1 -1.35835 -1.34016 1.77321 0.37978 -0.503198 1.8005 0.791461 0.247676 -1.51465 0.207643 0.624501 0.0660837 0.717293 -0.165946 2.34586 -2.89008 1.10997 -0.121359 -2.26186 0.52498 0.247998 0.771679 0.909412 -0.689281 -0.327642 -0.139097 -0.0553528 -0.0597518 378.66 0
4 1.22966 0.141004 0.0453708 1.20261 0.191881 0.272708 -0.005159 0.0812129 0.46496 -0.0992543 -1.41691 -0.153826 -0.751063 0.167372 0.0501436 -0.443587 0.00282051 -0.611987 -0.045575 -0.219633 -0.167716 -0.27071 -0.154104 -0.780055 0.750137 -0.257237 0.0345074 0.00516777 4.99 0
11 1.06937 0.287722 0.828613 2.71252 -0.178398 0.337544 -0.0967169 0.115982 -0.221083 0.46023 -0.773657 0.323387 -0.0110759 -0.178485 -0.655564 -0.199925 0.124005 -0.980496 -0.982916 -0.153197 -0.0368755 0.0744124 -0.0714074 0.104744 0.548265 0.104094 0.0214911 0.0212933 27.5 0
14 -5.40126 -5.45015 1.1863 1.73624 3.04911 -1.76341 -1.55974 0.160842 1.23309 0.345173 0.91723 0.970117 -0.266568 -0.47913 -0.526609 0.472004 -0.725481 0.0750814 -0.406867 -2.19685 -0.5036 0.98446 2.45859 0.0421189 -0.481631 -0.621272 0.392053 0.949594 46.8 0
29 1.11088 0.168717 0.517144 1.32541 -0.191573 0.0195037 -0.0318491 0.11762 0.0176647 0.0448648 1.34507 1.28634 -0.252267 0.274458 -0.810394 -0.587005 0.0874511 -0.550474 -0.154749 -0.19012 -0.0377087 0.0957015 -0.0481976 0.232115 0.606201 -0.342097 0.0367696 0.00747996 6.54 0
33 -0.607877 1.03135 1.74045 1.23211 0.418592 0.119168 0.850893 -0.176267 -0.243501 0.148455 -0.387003 0.398299 0.481917 -0.365439 0.235545 -1.34781 0.504648 -0.798405 0.75971 0.254325 -0.0873292 0.258315 -0.264775 0.118282 0.173508 -0.217041 0.0943119 -0.0330413 14.8 0
35 1.3864 -0.794209 0.778224 -0.864708 -1.06413 0.351296 -1.19145 0.0526856 -0.304404 0.576517 -1.63111 0.0425595 2.0479 -0.739338 1.45622 -0.27205 -0.932007 1.92653 -0.659939 -0.273033 -0.228727 -0.123522 -0.131025 -0.929668 0.181379 1.19493 0.000531332 0.0199106 30.9 0
Click to expand Features Informations:
* Time Number of seconds elapsed between this transaction and the first transaction in the dataset
* V1may be result of a PCA Dimensionality reduction to protect user identities and sensitive features(v1-v28)
* V2
* V3
* V4
* V5
* V6
* V7
* V8
* V9
* V10
* V11
* V12
* V13
* V14
* V15
* V16
* V17
* V18
* V19
* V20
* V21
* V22
* V23
* V24
* V25
* V26
* V27
* V28abc
* AmountTransaction amount
* Class1 for fraudulent transactions, 0 otherwise

MindsDB Code example

import mindsdb
import pandas as pd
from sklearn.metrics import balanced_accuracy_score

def run():

    mdb = mindsdb.Predictor(name='cc_fraud')

    mdb.learn(from_data='processed_data/train.csv', to_predict='Class')

    predictions = mdb.predict(when_data='processed_data/test.csv')

    pred_val = [int(x['Class']) for x in predictions]
    real_val = [int(x) for x in list(pd.read_csv('processed_data/test.csv'))['Class'])]

    accuracy = balanced_accuracy_score(real_val, pred_val)

    #show additional info for each transaction row
    additional_info = [x.explanation for x in predictions]

    return {
        'accuracy': accuracy,
        'backend': backend,
        'additional info': additional_info
    }

# Run as main
if __name__ == '__main__':
    print(run())

Mindsdb accuracy

Accuraccy Backend Last run MindsDB Version Latest Version
0.921724518459069 Lightwood 16 April 2020 MindsDB PyPi Version