By clicking “Accept ”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Cookies Policy for more information.
Inside Dateio

Payment Data Enrichment: From Manual Cleaning to Neural Networks

Ondřej Slivka
min read

Cleaning data from payment transactions was initially an painstaking manual task. Seven years ago, we were learning to understand data, googling and searching the web for data. We started with a few thousand enriched transactions a month, but gradually we algorithmized the manual work and built our data engine. Today, we process thousands of times more transactions with a huge degree of automation.

What is payment data for anyway?

Payment transaction data is ugly, unstructured data that has virtually no use in its raw form. There is no uniform standard for how to record it, and on a statement you may see data like:


Our aim is to transform it into something better. Which means understanding which specific business the transaction happened at and to enrich the information with the exact location, logo and type of purchase. Essentiall

On the left is an example of a payment card data from which we make cleaned and enriched data (right).

In the beginning, it was manual work

When we started in 2013, we had to manually clean the data. We had hired mothers on maternity leave to help us with this. Sometimes it was resembled a detective`s work. It's not enough just to identify the keyword "Tesco". You need to understand whether it's a specific Tesco store, a purchase at Tescoma or even a withdrawal from a Tesco ATM. Each entry needs to be properly identified.

Of course, this was not a scalable approach. As the business grew, we started to algorithmise and automate the work. What we used to do intuitively, we rewrote into individual technical steps.

But along with the automation of data cleansing, the error rate started to increase. We needed to introduce statistical algorithms and neural networks that could detect abnormalities and algorithm errors. It was necessary to find out why the error occurred and how to prevent it. This is long-term, systematic and consistent work.

For example, it may happen that a retail chain closes a branch in Liberec and moves the payment terminal to a store in Pilsen. But nobody updates the terminal and transactions are still recorded as Shop XY, Liberec. We need to have methods to recognize this. Statistical algorithms analyze the sequence of purchases and alert us to abnormalities.

Cleaning the data is a lot of small steps.

Terminal data is often incomplete and sometimes downright misleading. A typical case are McDonald's that operate as franchises. It is not the brand that is shown on the transaction record, but the legal entity that operates the franchise. We then have to laboriously track down who is really behind the business and correctly identify them.

We still need a human in the process, but most algorithms are already making do on their own and don't get to manual assessment.

From thousands to hundreds of millions

For the first few years, the number of automatically identified transactions grew only gradually. In 2018, we were only able to enrich about 100,000 transactions per month.

But as volumes increased, we had to upgrade our data engine. You can see the staircase on the chart, where we jumped up to 30% thanks to new algorithms. Today, we process over half a billion transactions per month, which is about three times the volume of data in the entire Czech Republic.

Number of identified businesses

There's still room for growth

The infrastructure of payment terminals is constantly changing. The lifetime of a terminal is 3 to 4 years. This means that 25% of terminals are replaced every year and we have to correctly identify and reassign them.

We are improving our engine and inventing new ways to make data cleaning more accurate and even more automated. We've only just started adding really advanced algorithms and we see incredible potential.

At the same time, we're opening up foreign markets where we have to start practically from scratch. We're discovering new and new problems there, but ultimately they're going to push us further.

Our huge advantage is that we are not only enriching the data, but also working with the outputs. Within the company, we have products that allow customers to run marketing campaigns that are precisely targeted simply thanks to information about consumer payment behavior.

This allows us to constantly see how valuable the data is and what can be learned from it. The connection is unique.

About author

Ondřej Slivka

Senior insider

A seasoned B2B marketing enthusiast with 5+ years of experience sharing insights in the world of digital banking and fintech. My passion lies in crafting innovative strategies and engaging content that delivers desired results.

Table of contents

Be one step ahead

Subscribe to receive regular once-a-month newsletter with guides, tips, success stories, industry insights and many more from the world of payment data.

By clicking Sign Up you're confirming that you agree with our Terms and Conditions.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Be two steps ahead

Other articles from the last few days that you might be interested in

6 reasons to work in the Dateio data team - TapiX Blog

There is no easy one-size-fits-all solution to our problem. It's not about being given a clear brief and having a process to get there.
Read article

Payment Data Enrichment: From Manual Cleaning to Neural Networks - TapiX blog

Explore the evolution of payment data processing, from manual cleaning by mothers on maternity leave to the implementation of advanced neural networks
Read article