A Brief Introduction to Synthetic Data for Data Monetization

Synthetic data

A Brief Introduction to Synthetic Data for Data Monetization

Monetizing data is an international obsession, spanning just about every sector and area of the business. The economic pressures of Covid-19 have only exacerbated the trend, as companies around the world intensify efforts to squeeze value out of their datasets to make themselves more resilient and competitive. The global data monetization market, worth around $1.9 billion in 2020, is now expected to swell to $4.8 billion by 2027.

Everyone’s getting in on the act, looking for smart, lucrative ways to use their data assets to rise above the competition. Deloitte predicts that, from 2021, sports teams will prioritize collecting and distributing data as products for third parties. Telecoms companies are seeking more effective ways to turn customer data into insights they can use for carefully targeting advertising campaigns. According to Accenture, 80% of the value derived from AI-driven data monetization in the insurance sector comes from new revenue streams, operational improvements and business growth. While the remaining 20% comes from improving trust, efficiency and data quality, so that existing data can be used to better effect. 

Financial institutions are no less intrigued by the prospect of monetizing their data – but they are, of course, cautious. Any new innovation involving sensitive customer data creates disclosure risks and compliance headaches. Especially when you need to share these sensitive datasets with internal or external partners. Or when you must remove layers of encryption so that the data can be used to train machine learning models.

So what can banks and financial companies do to get past these challenges and exploit the potential of data monetization?

The first step is synthetic data generation. Using privacy-preserving synthetic data (rather than the original datasets) means you instantly eliminate the disclosure risks involved in training and testing predictive models with financial data. You get all the insights and benefits of your “real”, underlying data, but you don’t have to hide a synthetic dataset away behind privacy-enhancing technologies (PETs), or put other controls on who you collaborate on data-driven projects with, because the data doesn’t lead back to any real people. 

In fact, you could potentially take your synthetic data monetization strategy even further by repackaging and selling the synthetic data to third parties. Again, even though the synthetic data is just as useful and revealing about customer trends and behaviors as authentic data, it doesn’t belong to anyone but you. There’s no one to ask consent from to sell or share it, so you don’t run into the same GDPR issues. Imagine the potential range of revenue streams that could open up for your business. 

What’s more, generating synthetic data doesn’t just mean replicating a dataset in a way that it can be used without compliance limitations, or to improve data security without undermining agility. You can also use this strategy to produce more of the data you need for your AI-driven data monetization projects. You can increase the overall volume of data, produce more of the types of data you’re lacking, or even model scenarios that haven’t happened yet. 

Best of all, if you’re working with a top vendor, the resulting synthetic dataset will be extremely high in quality. It will be delivered in the best format for your project, either labeled or unlabeled, as per your needs. It won’t be riddled with inaccuracies, missing values and other problems that need fixing before you can get started. 

This makes the ROI potential even higher. You cut costs associated with data preparation and shorten your time to model deployment, allowing you to start monetizing your data sooner. In other words, synthetic data both enables and enhances data monetization, especially in an industry as tightly regulated as finance. 


Even if you haven’t got a fancy new machine learning-driven product or data-driven, alternative revenue stream figured out just yet, embracing synthetic data is a vital first stage of any data monetization strategy. Consider for a moment how much your organization spends on looking after its data assets. On security infrastructure. On attracting the right kind of talent into your company and retaining that expertise, ensuring you never suffer a leak or breach. How can you justify all that expense if your data is just going to languish in a silo or under layers of encryption forever? Or if you’re going to have to delete it in a few years’ time, in line with GDPR? Generating synthetic data means that you are guaranteed to derive an asset from your data that you can actually put to work, whenever you need it. Without that, you’ll never be able to monetize your data at all. 



Has this piqued your interest? Want to know more? Read our in-depth article here >