4 Tips to Detox, Cleanse and Validate Your Data

In the relentless pursuit to provide better services and simultaneously better understand their customers, wealth management firms are splurging on data. 78% of companies said they plan to use more market and reference data over the coming 12 months, according to a survey by global data and technology vendor, Refinitiv, a company that manages 90 million instruments on its platform.

Spending more on data won’t reap the benefits if it isn’t cleaned, validated and properly maintained. This is especially true when implementing artificial intelligence / machine learning platforms, which require tremendous amounts of data on which to be trained.

The reality is that the value of your firm’s data is often greater than the sum of its parts. In the financial services sector specifically, the use of data enables delivery of greater value and personalized services to clients and helps address business challenges, such as fraud.

But without a plan to handle your data, you’ll soon have a mess on your hands. Each data supplier has a different structure for what they send, and it’s up to you to ensure that dirty data is not fed into your key systems

The processes of data validation and data cleansing ensures that errors are detected early in the ingestion process. Missing, malformed, incomplete, and duplicate values must be either corrected, deleted, or replaced.

Clean data is required in order to take advantage of advanced capabilities such as predictive analytics. Harnessing the power of AI/ML, predictive analytics enables firms to better understand their clients’ needs and proactively suggest strategies to maximize returns. (See

With AI/ML adoption set to take off, we sat down with Softlab360 CEO Henry Zelikovsky to ask about preparing your data for next-generation technologies like predictive analytics.

Tip #1: Understand the Data That You’re Supposed to Receive.

Before the cleansing and validation process can begin, the first step is to fully understand exactly what data you are supposed to be receiving. Due to the sheer number of data feeds that most wealth management firms bring in (and the ever expanding amount of data contained within) having a complete inventory is critical before starting.

You will need to understand how the sender formats their data, what they will send, and how often they will send it. “Each supplier has their own methods,” Zelikovsky said. “Companies need to adapt to receive that data.”

It used to be common for most data to be sent as end-of-day or start-of-day files. But the rise of application programming interfaces (APIs) has increased the availability of intraday and even real-time delivery. APIs also enable firms to request just the data they need rather than having to scan through large flat files.

Even with easier methods, the incoming data is still not of high-enough quality to be of much use, so it must be validated. And the most efficient and scalable way to do that is through automation.

Tip #2: Automate, Automate, Automate.

According to data vendor Experian Data Quality, some of the benefits of automating data validation include:

  • Capacity to verify huge datasets
  • Less allocation of internal human resources to the maintenance and upkeep of data
  • Extremely efficient in terms of setting up logical rules to run automatically
  • Ability to cleanse data in real time or customized scheduled cleanses
  • Flexibility to act on the front- and back-end of your collection channels

In a perfect world, all data feeds would be perfectly formatted. Unfortunately, they’re not, which makes validation challenging. Zelikovsky puts the scale of the task into perspective:

As a result, Zelikovsky strongly recommends that firms automate their data validation processes, and he highlighted two areas on which to focus:

  • Completeness: Automated cleansing and validation processes should identify the beginning and end of feeds and whether the transmitted data was received in full. The process should also be able to identify and recover from any delivery failures without human intervention, if possible.
  • Notifications: While the goal of automation is to remove humans from the process, things can still go wrong. The system should be designed to notify administrators when the process starts and ends – and if anything went wrong (and what was done about it).

Bottom line, the goal is to automate, retry if there was a problem, reset, start from the beginning, or continue from the point of stoppage or point of failure, and try to do it in the time allotted for such work, Zelikovsky stated.

Zelikovsky doesn’t advocate for complete automation, however. “There’s usually at least a few people on the desk to supervise the processes,” he noted.

These human monitors ensure that delivery starts and ends at the expected time and that the system resets properly in case of an error. If it does not, these people contact the sender and to attempt to fix the problem before advisors/users log in the next day.

All of the above must happen in a tight timeframe, typically between 7:00 p.m. to 1:00 a.m. ET and sometimes later. The goal should be to complete this by 5:00 a.m. ET the following day so advisors and operations staff have what they need to do their jobs.

Zelikovsky says data suppliers have made it easy for you to get started on validation and cleansing well before the end of the day. (See 7 Tips for Improving Your Firm’s Data Analytics)

Tip #3: Take Advantage of Intraday Reporting.

As intraday reports become commonplace, you’ll be receiving data in smaller batches. Take advantage of this to get a head start o validation and cleansing before the evening data crush, Zelikovsky advised.

“There are several schools of thought when that’s appropriate,” he said. “But our experience shows that you should deal with incoming data immediately,” Zelikovsky said.

He cautioned that this doesn’t necessarily mean that the workload will be less, but that it is important to maintain a high level of efficiency since data validation and cleansing can be resource-intensive. Spreading it throughout the day helps tamp down processing requirements and can reduce cloud computing costs.

Speed and accuracy are both equally important when it comes to processing data. Taking advantage of intraday reports to get a head start on the day’s reconciliations goes far towards meeting that goal – and gives you more time to resolve problems should they arise. (See Running Up the Score: How Predictive Analytics Gives You an Advantage Over Your Competitors)

Tip #4: Standardize with a Mapping Facility.

Even though automation helps to process large amounts of data efficiently, it needs to be paired with a service that understands the data format of different providers and can help standardize the myriad of feeds into a single unified layout.

A data mapping facility is a service that sits in between your firm and data providers. First, it inspects the data to ensure it’s the correct type (i.e., numeric, text, date/time), as well as the correct precision and even the proper range (e.g., unless it’s Berkshire Hathaway, an equity price shouldn’t be in the hundreds of thousands of dollars).

“The mapping facility’s work is critical because if it doesn’t work correctly day in and day out, the firm’s account holdings, assets, positions, pricing, and currencies would all be inaccurate,” Zelikovsky said. “Worse yet, any reporting and predictive analysis that rely on that data would also be inaccurate.”

The most obvious job for the mapping facility is organizing and mapping the security masters of the various exchanges (a topic we’ll discuss in a future blog post). Each security that comes through any data feed must be identified and collated with appropriate pricing and trade data.

As alternative investments become more widely held, it is important that data processing applications can support a wide range of security types beyond just equities, mutual funds, and fixed income, Zelikovsky said. “Knowing what source supplies what data and adapting to each source is the mapping facility’s primary job.”

Zelikovsky recommended that anyone responsible for monitoring data validation and cleansing processes should also understand what correct data looks like. “If there is missing data, they need to know how to find it in the source itself and understand what’s wrong, correct it, and keep the process moving,” he said. (See Big Data as a Service is the Next Big Thing in Artificial Intelligence)

For more information about how a partnership Softlab360 can expand your predictive potential and give you a competitive advantage, visit www.Softlab360.com.



The Wealth Tech Today blog is published by Craig Iskowitz, founder and CEO of Ezra Group, a boutique consulting firm that caters to banks, broker-dealers, RIA’s, asset managers and the leading vendors in the surrounding #fintech space. He can be reached at craig@ezragroupllc.com