Synthetic Data in Financial Services


The use of Synthetic Data in Financial Services

“Ignoring technological change in a financial system based upon technology is like a mouse starving to death because someone moved their cheese” 

Chris Skinner, financial markets commentator and author

TL;DR: Industry leaders in the financial sector are using synthetic data to facilitate collaboration and innovation, improving fraud detection and marketing while opening up brand new revenue streams.  

The Bottom Line: 

For many companies in the finance sector, privacy fears and compliance headaches are turning valuable datasets into white elephants. They’re too risky to put to work, too precious to let go… and far too costly to leave sitting idle. 

Which is a shame, because data-driven, machine learning models can be used to develop new products and revenue streams, reduce lending risk, streamline KYC processes, and predict fraud attempts with impressive accuracy. Insights and analytics from your data help streamline internal processes, predict customer churn and LTV, and design personalized marketing campaigns to boost the business. 

When you consider the lucrative potential of all that data, it’s frustrating to think of it languishing in silos. But how do you share and collaborate on data projects, internally and externally, without risking data leaks and privacy breaches? 

Anonymization isn’t reliable. Encryption undermines data utility. Generating synthetic data, on the other hand, not only overcomes the privacy liability issue, but it also addresses scalability limitations. You produce accurate, high-volume, artificial test data for DevOps at lightning speed, shortening test cycles, and reducing time to production. 

Under the Hood:

Until the 20th Century, white elephants in Thailand were sacred. If one came into your possession, you had to make darn well sure you kept it safe, no matter how much it cost. Since the elephant was incredibly valuable and had to be protected, you couldn’t risk putting it to work. You couldn’t monetize it. And you couldn’t give it to anyone else. In fact, that’s precisely why Kings gave these precious animals to courtiers they disliked. In theory, it was a great privilege, but in reality, it was an expensive, restrictive millstone around the recipients’ neck. 

In an age of ever-increasing data privacy legislation and cybersecurity threats, financial sector data can feel like a white elephant. 

Companies are acutely aware that if anything happens to this precious resource, they’ll incur legal, financial, and reputational risks. Many are so worried about leaks and privacy breaches that they lock their customer data away, anonymize it as best they can, and barely derive insights or value from it. They don’t dare share data between teams or collaborate on machine learning projects. They lack agility. They rack up costs. They lose competitive advantage.

But it doesn’t have to be this way. 

You don’t have to expose your white-elephant data to risks in order to derive genuine business value from it. Using AI, you can construct an entirely new, statistically accurate, synthetic dataset, based closely on the original one. This synthetic dataset has all the heft, power, and features of the underlying data, so you can use it to achieve the same real-world results. But no real-world, sensitive information is vulnerable.

It’s the data equivalent of making an animatronic version inspired by your white elephant – one that looks and behaves just like a real one – except you don’t have to worry about it getting out or getting hurt. No matter what you do with the synthetic one, the original stays safe. 

For the financial services industry, the implications are huge

Major financial corporations and organizations from American Express to the FCA are already using synthetic data to build sophisticated fraud-detection models, to assess a customer’s level of financial vulnerability, and to make nuanced, accurate, SME lending decisions. 

While the main drivers are privacy protection and acquiring enough quality data to test and train machine learning models, synthetic datasets have other compelling benefits, too. As this research paper by JP Morgan explains, traditional machine learning techniques struggle to detect anomalies in the highly imbalanced “real” datasets used to train fraud detection models. Whereas, with synthetic data generation, you can tackle class imbalance in these datasets, leading to far more accurate results. 

You can also use insights into customer behaviors, interactions, and preferences to build perfectly personalized marketing campaigns. You can calculate risk with more confidence and precision. You can develop, sell, or white label data-backed products with fintech partners. You can analyze churn, and predict CLTV and ROI for any given campaign. You can complete KYC processes faster and with greater accuracy, delivering a frictionless onboarding experience for new customers.

In short, you can do just about anything you wanted to do with your original data but couldn’t due to compliance worries and privacy concerns. Let’s take a look at some use cases for synthetic data in financial services in more detail.

Data Sharing and Collaboration

Synthetic data generation facilitates and encourages internal data sharing, especially within organizations that store high volume, complex, varied datasets in silos. Since you don’t have to worry about regulatory requirements, privacy protection, or employee misuse of data, you can disseminate the synthetic data among teams, collaborating on new projects. Access to the original production data remains tightly controlled. 

You have the freedom to share this data with external partners, vendors, consultants and other third parties, too. Some of the most exciting developments in the finance industry stem from collaboration between financial institutions and agile, market-changing fintech startups – often, across national borders. These relationships can be a regulatory nightmare, especially when moving sensitive data around the world or into the cloud. Synthetic data removes privacy risks, making collaborations easier, and allowing projects to progress faster.

Innovating with Data 

A shortage of high-quality historical data is a constant headache for data scientists developing and training machine learning models. Add to this complex mathematical anonymization, including techniques like perturbation or homomorphic encryption, and using what sensitive datasets becomes very difficult indeed. 

Especially given that any techniques used to provide additional context, like augmenting internal datasets with external data, could make re-identification even easier. Or when they’re fighting against the clock to derive insights and long-term patterns from internally stored, historical data before their data retention window runs out, as per GDPR rules. 

Synthetic data provides an excellent alternative. Not only can you use this to test data and combine it with external datasets without exposing real people’s sensitive data, but you can also scale up as required, ensuring you have enough to train ML models effectively. 

What’s more, from a DevOps point of view, this artificially generated data has the added bonus of shortening your time to production. You don’t have to wait to collect or acquire real-world data to test your models. This means decreased test times, increased flexibility, and far more agile development. The synthetic dataset mimics the statistical properties of the source production data, so you can use it to validate models and test new products, services, and software performances. 

Monetizing Your Data

Generating business-critical insights can deliver solid ROI when it comes to improving internal processes and operations, minimizing risk, fine-tuning marketing campaigns, and so on. But generating synthetic data based on your production data also opens up opportunities to develop new revenue streams. 

For example, let’s say you have an extensive repository of credit card data, including transaction amounts, times, and locations. This information would be extremely useful to the retail sector – and potential buyers are prepared to pay a lot for it. 

For obvious reasons, you can’t sell the original data. Packaging and selling customer data to third parties is heavily regulated. Even if you anonymized it, the risk of re-identification would be enormous. You could, however, sell a synthetic dataset of aggregate values that, in terms of its statistical makeup, is indistinguishable from the real one. As such, you capitalize on a lucrative new revenue-generation opportunity without putting user privacy at risk. 

Final Thoughts: Realizing the Potential of Your Data

Financial institutions worldwide are racing to keep up with the pace of digital transformation, at a time when market trends are dynamic and unpredictable. 

You need to ensure you can acquire sufficient amounts of recent, relevant, historical data to give you insights and predictions that aren’t already out of date by the time you reach them. You need to make sure you can meet customer expectations, delivering personalized marketing, great UX, and industry-leading products. You need to make use of data architecture and cloud-based infrastructure to stay agile and competitive.

These things are only achievable if you can mine every bit of value possible from your datasets. It’s vital that you find a way to keep the original production data safe, ensuring total privacy while generating synthetic data for all your operational and innovation needs. This is no time to be dragged down by a decorative white elephant. To compete, you need your data to work harder than ever before.

Curious to see how this could work for your data? Contact us to request a demo today!