Bluetab

Bank Fraud detection with automatic learning

September 17, 2020 by Bluetab

Bank Fraud detection with automatic learning

The financial sector is currently immersed in a fight against bank fraud, with this being one of its biggest challenges. Spanish banking saw an increase of 17.7% in 2018 in claims for improper transactions or charges compared to the previous year and in 2017 alone there were over 123,064 on-line fraud incidents against companies and individuals.

The banking system is confronting the battle against fraud from a technological point of view. It is currently in the midst of a digitisation process and, with investments of around 4,000 million euros per year, it is putting its efforts into adoption of new technologies such as Big Data and Artificial Intelligence. These technologies are intended to improve and automate various processes, including detection and management of fraud.

At /bluetab we are undertaking a variety of initiatives within the technological framework of Big Data and Artificial Intelligence in the financial sector. Within the framework of our “Advanced Analytics & Machine Learning” initiatives, we are currently collaborating on Security and Fraud projects where, through the use of Big Data and Artificial Intelligence, we are able to help our clients create more accurate predictive models.

So, how can automatic learning help prevent bank fraud?. Focusing on collaborations within the fraud area, /bluetab addresses these types of initiatives based on a series of transfers identified as fraud and a data set with user sessions in electronic banking. The challenge is to generate a model that can predict when a session may be fraudulent by targeting the false positives and negatives that the model may produce.

Understanding the business and the data is critical to successful modelling.

In overcoming these kinds of technological challenges, we have noted how the use of a methodology is of vital importance in addressing these challenges. At /bluetab we make use of an in-house, ad-hoc adaptation for Banking of the CRISP-DM methodology in which we distinguish the following phases:

Understanding the business
Understanding the data
Data quality
Construction of intelligent predictors
Modelling

We believe that in On-line Fraud detection projects understanding the business and the data is of great importance for proper modelling. Good data analysis lets us observe how these are related to the target variable (fraud), as well as other statistical aspects (data distribution, search for outliers, etc.) which are of no less importance. You can note the presence in these analyses of variables with great predictive capacity, which we call “diamond variables”. Attributes such as the number of visits to the website, the device used for connection, the operating system or the browser used for the session (among others) are usually strongly related to bank fraud. In addition, the study of these variables shows that, individually, they can cover over 90% of fraudulent transactions. That is, analysing and understanding the business and the data enables you to evaluate the best way of approaching a solution without getting lost in a sea of data.

Once you have the understanding of the business and the data and after having obtained those variables with greater predictive power, it is essential to have tools and processes that ensure the quality of those variables. Training the predictive models with reliable variables and historical data is indispensable. Training with low-quality variables could lead to erratic models with great impacts within the business.

After ensuring the reliability of the selected predictor variables, the next step is to construct intelligent predictor variables. Even though these variables, selected in the previous steps, have a strong relationship with the variable to be predicted (target), they can result in certain problems in behaviour when modelling, which is why data preparation is necessary. This data preparation involves making certain adaptations to the variables to be used within the algorithm, such as the handling of nulls or of categorical variables. Additionally, proper handling of the outliers identified in the previous steps must be performed to avoid including information that could distort the model.

With the aim of “tuning” the result, it is similarly of vital importance to apply various transformations to the variables to improve the model’s predictive value. Basic mathematical transformations such as exponential, logarithmic or standardisation, together with more complex transformations such as WoE, make it possible to substantially improve the quality of the predictive models thanks to the use of more highly processed variables, facilitating the task of the model.

Finally, the modelling stage focuses on confronting different types of algorithms with different hyperparameter configurations to get to the model that generates the best prediction. This is where tools such as Spark help to a great extent, by being able to carry out training of different algorithms and configurations quickly, thanks to distributed programming.

For sustainability of your application and to avoid model obsolescence, this methodology needs to be followed monthly in each use case and more frequently when dealing with an initiative such as bank fraud. This is because new forms of fraud may arise that are not covered by the trained models. This means it is important to understand and to select the variables with which to retrain the models so that they do not become obsolete over time, which could seriously harm the business.

In summary, a good working methodology is vital when addressing problems within the world of Artificial Intelligence and Advanced Analytics, with phases for understanding the business and the data being essential. Having specialised internal tools to enable these types of projects to be executed in just a few weeks is now a must, to generate quick wins for our clients and their business.

Do you want to know more about what we offer and to see other success stories?

SOLUTIONS, WE ARE EXPERTS

DATA STRATEGY

DATA FABRIC

AUGMENTED ANALYTICS

Cómo depurar una Lambda de AWS en local

October 8, 2020

Myths and truths of software engineers

June 13, 2022

LakeHouse Streaming on AWS with Apache Flink and Hudi (Part 1)

April 11, 2023

Azure Data Studio y Copilot

October 11, 2023

Oscar Hernández, new CEO of Bluetab LATAM.

May 16, 2024

Hashicorp Boundary

December 3, 2020

Spying on your Kubernetes with Kubewatch

September 14, 2020 by Bluetab

Spying on your Kubernetes with Kubewatch

At Cloud Practice we aim to encourage adoption of the cloud as a way of working in the IT world. To help with this task, we are going to publish numerous good practice articles and use cases and others will talk about those key services within the cloud.

This time we will talk about Kubewatch.

What is Kubewatch?

Kubewatch is a utility developed by Bitnami Labs that enables notifications to be sent to various communication systems.

Supported webhooks are:

Slack
Hipchat
Mattermost
Flock
Webhook
Smtp

Kubewatch integration with Slack

The available images are published in the bitnami/kubewatch GitHub

You can download the latest version to test it in your local environment:

$ docker pull bitnami/kubewatch

Once inside the container, you can play with the options:

$ kubewatch -h

Kubewatch: A watcher for Kubernetes

kubewatch is a Kubernetes watcher that publishes notifications
to Slack/hipchat/mattermost/flock channels. It watches the cluster
for resource changes and notifies them through webhooks.

supported webhooks:
 - slack
 - hipchat
 - mattermost
 - flock
 - webhook
 - smtp

Usage:
  kubewatch [flags]
  kubewatch [command]

Available Commands:
  config      modify kubewatch configuration
  resource    manage resources to be watched
  version     print version

Flags:
  -h, --help   help for kubewatch

Use "kubewatch [command] --help" for more information about a command.

For what types of resources can you get notifications?

When will you receive a notification?

As soon as there is an action on a Kubernetes object, as well as creation, destruction or updating.

Configuration

Firstly, create a Slack channel and associate a webhook with it. To do this, go to the Apps section of Slack, search for “Incoming WebHooks” and press “Add to Slack”:

If there is no channel created for this purpose, register a new one:

In this example, the channel to be created will be called “k8s-notifications”. Then you have to configure the webhook at the “Incoming WebHooks” panel and adding a new configuration where you will need to select the name of the channel to which you want to send notifications. Once selected, the configuration will return a ”Webhook URL” that will be used to configure Kubewatch. Optionally, you can select the icon (“Customize Icon” option) that will be shown on the events and the name with which they will arrive (“Customize Name” option).

You are now ready to configure the Kubernetes resources. There are some example manifests and also the option of installing by Helm on the Kubewatch GitHub However, here we will build our own.

First, create a file “kubewatch-configmap.yml” with the ConfigMap that will be used to configure the Kubewatch container:

apiVersion: v1
kind: ConfigMap
metadata:
  name: kubewatch
data:
  .kubewatch.yaml: |
    handler:
      webhook:
        url: https://hooks.slack.com/services/<your_webhook>
    resource:
      deployment: true
      replicationcontroller: true
      replicaset: false
      daemonset: true
      services: true
      pod: false
      job: false
      secret: true
      configmap: true
      persistentvolume: true
      namespace: false

You simply need to enable the types of resources on which you wish to receive notifications with “true” or disable them with “false”. Also set the url of the Incoming WebHook registered previously.

Now, for your container to have access the Kubernetes resources through its api, register the “kubewatch-service-account.yml” file with a Service Account, a Cluster Role and a Cluster Role Binding:

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: kubewatch
rules:
- apiGroups: ["*"]
  resources: ["pods", "pods/exec", "replicationcontrollers", "namespaces", "deployments", "deployments/scale", "services", "daemonsets", "secrets", "replicasets", "persistentvolumes"]
  verbs: ["get", "watch", "list"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kubewatch
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: kubewatch
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kubewatch
subjects:
  - kind: ServiceAccount
    name: kubewatch
    namespace: default

Finally, create a “kubewatch.yml” file to deploy the application:

apiVersion: v1
kind: Pod
metadata:
  name: kubewatch
  namespace: default
spec:
  serviceAccountName: kubewatch
  containers:
  - image: bitnami/kubewatch:0.0.4
    imagePullPolicy: Always
    name: kubewatch
    envFrom:
      - configMapRef:
          name: kubewatch
    volumeMounts:
    - name: config-volume
      mountPath: /opt/bitnami/kubewatch/.kubewatch.yaml
      subPath: .kubewatch.yaml
  - image: bitnami/kubectl:1.16.3
    args:
      - proxy
      - "-p"
      - "8080"
    name: proxy
    imagePullPolicy: Always
  restartPolicy: Always
  volumes:
  - name: config-volume
    configMap:
      name: kubewatch
      defaultMode: 0755

You will see that the value of the “mountPath” key will be the file path where the configuration of your ConfigMap will be written within the container (/opt/bitnami/kubewatch/.kubewatch.yaml). You can expand the information on how to mount configurations in Kubernetes here. In this example, you can see that our application deployment will be through a single pod. Obviously, in a production system you would need to define a Deployment with the number of replicas considered appropriate to keep it active, even in case of loss of the pod.

Once the manifests are ready apply them to your cluster:

$ kubectl apply  -f kubewatch-configmap.yml -f kubewatch-service-account.yml -f kubewatch.yml

The service will be ready in a few seconds:

$ kubectl get pods |grep -w kubewatch

kubewatch                                  2/2     Running     0          1m

The Kubewatch pod has two containers associated: Kubewatch and kube-proxy, the latter to connect to the API.

$   kubectl get pod kubewatch  -o jsonpath='{.spec.containers[*].name}'

kubewatch proxy

Check through the logs that the two containers have started up correctly and without error messages:

$ kubectl logs kubewatch kubewatch

==> Config file exists...
level=info msg="Starting kubewatch controller" pkg=kubewatch-daemonset
level=info msg="Starting kubewatch controller" pkg=kubewatch-service
level=info msg="Starting kubewatch controller" pkg="kubewatch-replication controller"
level=info msg="Starting kubewatch controller" pkg="kubewatch-persistent volume"
level=info msg="Starting kubewatch controller" pkg=kubewatch-secret
level=info msg="Starting kubewatch controller" pkg=kubewatch-deployment
level=info msg="Starting kubewatch controller" pkg=kubewatch-namespace
...

$ kubectl logs kubewatch proxy

Starting to serve on 127.0.0.1:8080

You could also access the Kubewatch container to test the cli, view the configuration, etc.:

$  kubectl exec -it kubewatch -c kubewatch /bin/bash

Your event notifier is now ready!

Now you need to test it. Let’s use the creation of a deployment as an example to test proper operation:

$ kubectl create deployment nginx-testing --image=nginx
$ kubectl logs -f  kubewatch kubewatch

level=info msg="Processing update to deployment: default/nginx-testing" pkg=kubewatch-deployment

The logs now alert you that the new event has been detected, so go to your Slack channel to confirm it:

The event has been successfully reported!

Now you can eliminate the test deployment:

$ kubectl delete deploy nginx-testing

Conclusions

Obviously, Kubewatch does not replace the basic warning and monitoring systems that all production orchestrators need to maintain, but it does provide an easy and effective way to extend control over the creation and modification of resources in Kubernetes. In this example case we performed a Kubewatch configuration across the whole cluster, “spying” on all kinds of events, some of which are perhaps useless if the platform is maintained as a service, as we would be aware of each of the pods created, removed or updated by each development team in its own namespace, which is common, legitimate and does not add value. It may be more appropriate to filter by the namespaces for which you wish to receive notifications, such as kube-system, which is where we generally host administrative services and where only administrators should have access. In that case, you would simply need to specify the namespace in your ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: kubewatch
data:
  .kubewatch.yaml: |
    namespace: "kube-system"
    handler:
      webhook:
        url: https://hooks.slack.com/services/<your_webhook>
    resource:
      deployment: true
      replicationcontroller: true
      replicaset: false

Another interesting utility may be to “listen” to our cluster after a significant configuration adjustment, such as our self-scaling strategy, integration tools and so on, as it will always notify us of the scale ups and scale downs, which could be especially useful initially. In short, Kubewatch extends control over clusters, and we decide the scope we give it. In later articles we will look at how to manage logs and metrics productively.

Do you want to know more about what we offer and to see other success stories?

SOLUTIONS, WE ARE EXPERTS

DATA STRATEGY

DATA FABRIC

AUGMENTED ANALYTICS

Databricks on AWS – An Architectural Perspective (part 2)

March 5, 2024

Workshop Ingeniería del caos sobre Kubernetes con Litmus

July 7, 2021

Leadership changes at Bluetab EMEA

April 3, 2024

CDKTF: Otro paso en el viaje del DevOps, introducción y beneficios.

May 9, 2023

Bank Fraud detection with automatic learning II

September 17, 2020

LakeHouse Streaming on AWS with Apache Flink and Hudi (Part 2)

October 4, 2023

Artificial Intelligence service for reading unstructured documents

September 11, 2020 by Bluetab

Artificial Intelligence service for reading unstructured documents

We carry out BPaaS Service (Business Process as a Service) for one of the leading financial institutions in innovation and technology in the Spanish market. We classify dozens of types of unstructured documents from which up to 10 different data are also extracted. With a success rate of more than 98% in document classification and more than 90% in the extraction of information.

Our technology performs a preprocessing of the documentation to later extract both the text and different characteristics of the document to classify it later using machine learning mechanisms. Subsequently, the relevant data are extracted according to the documentary typology using tasks of identification and categorization of key information of entities in text through the use of sophisticated models based on neural networks and other machine learning mechanisms.

The solution is seamlessly integrated into the client’s existing workflows, providing their back office teams with exceptional processing capacity, reducing processing times, improving quality and lowering the costs of these teams.

Our approach is based on Bluetab’s own FastCapture solution, which we adapt to the use cases and casuistry of the service provided to each client. The deployment of the solution is carried out through microservices in a Kubernetes cluster in EKS that allows us to adapt the size of the processing infrastructure in an elastic way; our software is developed using Elixir programming languages, using Phoenix Framework and React in the user interfaces and Python, Keras and TensorFlow among other Machine Learning mechanisms.

SUCCESS STORIES

Truedat in Utilities

September 11, 2020 by Bluetab

Truedat in Utilities

Our client is a leading multinational in the energy sector with investments in extraction, generation and distribution, with a significant presence in Europe and Latin America.

From Bluetab we have supported our client in the digital transformation process that they are addressing thanks to our Truedat technological solution that enables comprehensive management of data governance processes in Data Lake environments. The solution has allowed the client to start redirecting their data in the new digital environments, to generate a global business glossary and to improve in the Quality and data profiling activities.

The scope of the solution had to allow us to work with cloud technologies, mainly serverless AWS, and to be able to incorporate our Truedat solution in a very advanced technological environment.

SUCCESS STORIES

Truedat in the Insurance sector

September 11, 2020 by Bluetab

Truedat in the Insurance sector

Use of Truedat as a governance tool in one of the main insurance companies in Spain to comply with regulatory requirements.

In order to comply with these requirements, the definitions of business concepts are managed and the databases subject to regulation are automatically cataloged. Quality rules are also defined following a specific quality model defined for the entity.

This allows up-to-date quality reporting on regulated data. Additionally, the traceability tool is used to document the origin of the data used in anticipation of additional requirements.

SUCCESS STORIES

Provision

September 11, 2020 by Bluetab

Provision

Within the telecommunications sector, there has always been a desire to individualize the offer to each of its clients, so that they obtain exactly the product they need. This has not been possible due to a thousand circumstances, both technical and functional.

Bluetab has developed a solution that allows flexibility and self-management of this process of provision of telecommunications services to its clients in a multinational operator, which has allowed an improvement in both end-customer satisfaction and internal management of the entire workflow . Based on a flow engine, it has been possible to orchestrate automatically the configuration of the telecommunications service package that a business client needs to develop their business. The technological environment where this entire platform has been developed is Grails 1.XX, Grails 2, Sencha (ExtJS) and Grails 3.

SUCCESS STORIES

Bluetab

Bank Fraud detection with automatic learning

Do you want to know more about what we offer and to see other success stories?

DATA STRATEGY

DATA FABRIC

AUGMENTED ANALYTICS

Spying on your Kubernetes with Kubewatch

What is Kubewatch?

Kubewatch integration with Slack

For what types of resources can you get notifications?

When will you receive a notification?

Configuration

Your event notifier is now ready!

The event has been successfully reported!

Conclusions

Do you want to know more about what we offer and to see other success stories?

DATA STRATEGY

DATA FABRIC

AUGMENTED ANALYTICS

Artificial Intelligence service for reading unstructured documents

Truedat in Utilities

Truedat in the Insurance sector

Provision

Footer