It’s become a cliché that the amount of data being captured by the largest companies has been growing at an exponential rate.
On average, data professionals report that the volume of data being stored is growing by more than 60% per month! The top ten percent of companies reported data volume growth of 100% or more per month. That’s a tremendous amount of data to wrangle and analyze. What are some of the best practices that can help firms understand the incoming torrent and convert it into next best actions?
We recently spoke with Henry Zelikovsky, CEO of Softlab360, an outsourced software development shop that offers an outsourced predictive analytics service. His comments have been distilled down to the top seven recommendations for improving your firm’s data analytics.
1. Data is only useful when it is reliable.
Less than half of business leaders rated their companies’ data reliability as “very good”, according to a recent survey by data management provider Talend. These executives are either using unreliable data to make critical decisions or making decisions based on “gut feeling”, which 36% of business leaders admitted to.
Reviewing and learning from historical data can greatly improve data reliability according to Zelikovsky, a technology vendor that offers an outsourced predictive analytics service. “The data we learn from is historical data, and we learn from it for future situations,” he explained.
“Once we get new data we periodically relearn… and eventually we get to a consistency that’s reliable.” Once a certain level of reliability is reached, then the client can use the patterns to construct and predict their own future scenarios.
2. Data slicing is an important data analysis tool.
Data slicing is the process of breaking a large segment of data into smaller parts so it can be analyzed from different points of view. Eighty percent of Softlab360’s data came from the government, but that data has a lot of issues. The data from the government is not organized or normalized, and a vast majority of developers surveyed by Socrata found that government data is not clean and accurate, is not in a good format for their work, and is not well documented.
This means a lot of work needs to be done to turn the raw data into usable information. Zelikovsky described the process as taking an Excel spreadsheet and looking at the data either by rows, columns, or diagonally to assess different types of information from the larger set.
3. Good data analysis returns multiple nuanced answers.
Instead of just one “correct” answer, good data analytics can often find multiple statistically significant results that you may not even have expected. These multiple answers often add up to a complex and nuanced picture of the situation.
When Robert Kirk, CEO of Intergen Data, wanted to know when somebody makes the most amount of money in their life, Zelikovsky found two answers– 47 and 51. 80% of Americans make the most money at 47, around $62,000, but 20% are making the most at 51 and they’re making $157,000. The Softlab team was also able to find several other pockets of spending and saving throughout peoples’ lives that no one had recognized before that could help to classify them in different ways.
4. Stored data must be properly structured.
If data is not collected and stored properly, data analysts are not able to use it. Zelikovsky explained that “in the corporate world, people use data that they collected over 20, 30 years.” The data is often collected by the application that it sits in, but often some of that structure is not made in a way that allows data scientists to use it for machine learning.
In the cases that data has not been stored correctly, firms will have to collect data properly for a few months before they can go in and analyze it properly.
5. Watch out for surprises.
Zelikovsky described one project where his team was extracting one data set and accidentally found a number of other data points and patterns with interesting characteristics that their client had not asked them to look for. However, the consistency of the found data patterns was useful to the data scientists and so they bookmarked them to later learn from and add a new dimension to the original question.
Studying changes that seem ‘unusual’ and the correlating factors that follow them is increasingly proving to be a powerful way to predict future events. By following up with surprising correlations, Softlab360 is able to discover much more information about other relationships at play.
6. Finding the answers to one question often leads to another.
The data analysis process does not end when the answer to the question is found– oftentimes that answer leads directly to another question to unravel. “The point of our service is you always have a question behind a question,” Zelikovsky noted. “You can ask questions initially as a business starting point, and when you see the consistency of these results, you can say, “based on that, I have to ask something else.” Our value to the business continues to grow as long as we produce something new without learning.”
7. Data patterns reinforce themselves.
Tracking the ways that expenses correlate across physical locations or events in consumers’ lives are becoming a critical driver of AI analysis in retail and financial sectors. Zelikovsky and Kirk found that the expenses were not just tied to events but also to each other. Patterns in the data were not only reinforcing themselves, but also influencing each other and organizing into pockets of expenses which could predict other pockets of expenses to come.
“So it’s not just, Hey, everyone has a child at around the average age of 24,” Kirk illustrated, “but what are the other preceding cohorts that would make that change either go longer or go shorter? And that was the next greatest question, because then if we did that, then we could really start to map out additional life events.”