Wealth management firms swim in data. Whether it’s regulatory reporting, AML/KYC or portfolio management, research and analysis, the sheer amount of data becomes unwieldy as a firm grows – and these categories above merely scratch the surface.
Even small firms need storage for hundreds (if not thousands) of documents. Without a system to organize and manage all this data, your overall productivity will decrease as the amount of data grows as there’s more and more to search through.
There’s a technical term for these catch-alls: “data lakes.” Like it sounds, these are simply vast stores of raw data that are often not well categorized. Instead, metadata makes navigation through a data lake manageable, allowing users to perform queries to find what they need.
There’s no limit to the amount of data you can store as most organizations opt for scalable solutions, most commonly cloud storage. There’s one problem, though: finding that data point will soon be like finding a shipwreck in the ocean. In this industry time is money, and that’s why data warehouses are the better choice for wealth management.
We recently produced the fifth webinar in the series in partnership with Xtiva Financial Systems called, “Data Pollution: Don’t Let Your Data Lake Get Turned Into A Swamp” which included panelist Yaela Shamberg, co-founder and Chief Product Officer, InvestCloud.
In this article, we summarize Yaela’s insights on the state of data storage in the financial industry, when and how data lakes become a drag on your business, and what firms can do to prevent that from happening in the first place if it hasn’t already.
In case you missed this webinar, you can click here to unlock your access to the full recording.
Data Lakes vs. Data Warehouses
To better understand the differences between data lakes and warehouses, it’s best to look at the differences side by side. Enterprise data solutions provider Talend says there are four primary differences between a data lake and a data warehouse:
- Data: In a data lake, data is stored in ‘raw’ (unprocessed) formats, often with little organization other than any information placed in the metadata, regardless of its usefulness. Data warehouses on the other hand are ‘processed,’ where data is curated, organized, and discarded if it’s unnecessary.
- Purpose: The immense amount of data found in data lakes may not necessarily have a purpose yet (this also makes it perfect for machine learning applications). However, data warehouses only store information with a specific business purpose.
- Target user: Data lakes aren’t intended for use by the everyday employee. Querying this data isn’t like using a search engine and typically requires experience in working with and understanding vast amounts of data. On the other hand, data warehouses are easily queried and usable by anyone who knows the data’s subject matter.
- Usage: The lack of structure with data lakes does make it considerably easier to change since the raw data is easily updatable, and the entire data store is scalable – even if it’s much more challenging to use. Data warehouses on the other hand are highly structured, and while easier to navigate, the data within isn’t as easily manipulated.
Some technologists place some fairly strict guard rails on the differences between the two, but Shamberg noted during her remarks that the term “data lake” can be generic – and you might not know what you’re getting into.
One organization’s data lake could be just a massive store of unorganized data. Another organization’s data might be partially organized. It could even be several technology platforms combined – where the data in each platform is organized but is siloed from other data stores.
In October of 2010, James Dixon, founder and former CTO of Pentaho, came up with the term “Data Lake.” Dixon argued Data Marts come with several problems, ranging from size restrictions to narrow research parameters. In describing his concept of a Data Lake, he said:
“If you think of a Data Mart as a store of bottled water, cleansed and packaged and structured for easy consumption, the Data Lake is a large body of water in a more natural state. The contents of the Data Lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.”
“When you think about data lakes, it’s ambiguous in terms of what you’re going to get,” she noted. At InvestCloud, Shamberg and others think about more than just the data in the data store themselves. In addition to the data warehouse, they look at the structures, the tables, and how it all works together.
“One (data lakes) is a storing mechanism and a collection vehicle,” Shamberg said. “The other (data warehouses) is how to leverage and use your data to power solutions for your business.”
This organization increases productivity. [looking for study for here]
Shamberg says data warehouses should be planned out in a ‘methodical, well thought out and modeled’ way. “If you have a strong and resilient data model, one that’s deep and wide, ultimately you will reach more of your objectives than just mashing all your data together.”
When the data lake becomes a swamp
While data is increasingly important for firms to stay competitive in today’s market, without organization and structure, that can quickly become an albatross.
As we noted at the beginning, the most significant disadvantage of data lakes is that they become unwieldy as they grow larger. Data gets buried and becomes hard to navigate when that happens.
Data Lakes can create more problems than they solve. According to Adam Wray, CEO of Basho, a developer of open source distributed databases, data lakes are simply “evil”. Companies should view their data through a supply chain prism with a beginning, middle and end. All data needs to be collected, found, explored and transformed according to an organized plan, which maximizes the value that can be extracted.
Shamberg recalled a meeting with a potential InvestCloud client where those frustrations bubbled to the surface. “The technology team worked very hard for this organization and said, no, we have a data lake,” she recalled. “The business head on the other end of the table looks at him and says, ‘do you mean the swamp?’ In that case, it wasn’t working for them.”
This often happens due to poor planning or adopting technologies simply for technology’s sake. If it doesn’t fit in with your pre-existing technologies, follow the same model or structure, or relate to other data stores, that data can become extremely hard to find.
This is also true for older or large companies, which may have a similar problem. Here, the silo effect occurs due to software upgrades or transitioning to more robust software platforms as the business grows. While the data sets might be similar, they don’t exactly “line up,” Schaumburg argued.
“It can be an ornate set of opportunities or problems to solve,” she said, circling back to her points on having a detailed data strategy. “I’m partial to data and digital warehouses because of the extensive data model that you don’t get from most off-the-shelf solutions either because they don’t understand your industry or data enough.” (See Integration by Design: Building Flexible Data Platforms)
The larger the firm, the more data ‘needy’ they are
Shamberg also shared that the larger the firm, the needier they usually are around data. While smaller firms generally fared better in InvestCloud’s introductory analysis of their data practices, the larger the firm was, the less they understood their own data.
This isn’t just limited to firms that simply neglected to manage their data stores. One client of Shamberg’s was a “sophisticated wealth manager” with a global presence. Although it might sound surprising, she added a variety of reasons – some of which may be out of their control.
“The more legacy systems they have, the more technological debt (tech debt) they have, and even the higher their employee turnover is, the less they know their data,” she explained. Also, these firms are more difficult to transition to better data management because the firm still must be able to service its employees and clients as it occurs.
This could be why so many firms put off fixing their data woes. Shamberg argues that isn’t a reason to procrastinate. Services like InvestCloud can help you drain the data swamp and better understand your data, she said.
In a recent McKinsey survey, CIOs reported that 10 to 20% of their annual budget for new technology products is diverted to resolving issues related to tech debt. They also estimated that tech debt amounts to 20 to 40% of the value of their entire technology infrastructure, before depreciation. For larger organizations, this translates into hundreds of millions of dollars. And things are not improving: 60% of the CIOs surveyed felt their organization’s tech debt had risen perceptibly over the past three years.
“Technology is not why things fail in terms of project implementation,” she concluded. “I believe it’s either taking the wrong approach or not knowing how to overcome your primary data barriers.” (See 3 Smart Strategies for Corralling Your Data Infrastructure)
Draining the data swamp the right way
While some competitors allow clients to customize their data models, InvestCloud does not. Shamberg said they do this to provide the same level of service regardless of firm size, offering a level of data expertise outside the budget of all but the biggest firms to everyone.
InvestCloud’s data model features more than a decade of work, supports hundreds of clients across the financial space, and is already well developed – so it’s likely little customization would be necessary anyway. It also allows InvestCloud to ensure that future updates don’t run into compatibility issues or, worse yet, fail altogether.
This doesn’t mean InvestCloud isn’t receptive to customer requests. “We’ll work with you to extend it,” Shamberg responded. “But go on this journey with us because it’s a bigger boat and a pretty big ocean versus your small vessel that sometimes can get a little wobbly, depending on what the waters look like.” (See Speaking Their Language: Integrating Insurance Data Into Wealth Management Systems)
Buy-in from the business side is important, too
Whatever changes you make do require buy-in not only on the IT side but also on the business side. In Shamberg’s case, while the CTO is more than welcome to attend those introductory meetings, InvestCloud won’t hold them without the business side in attendance. This way, everyone is on the same page, especially considering some decisions won’t be purely technology-related.
“Having that accountability all the way up in that communication line and that governance to say, ‘Okay, let’s take a different approach. Let’s take a step back,’” she concluded. “Having some of those difficult or candid conversations is important, but having the business involved is hands down the simplest and most fundamental thing I’ve seen to a positive outcome.” (See 4 Benefits of a 360-Degree Client View)