- How to Build a Machine Learning Platform
- What is Data Slicing?
- 80% of Data Comes from the Government
- Storing Massive Amounts of Data
- Analyzing Pockets of Expenses
- Predicting Critical Illnesses
- Big Data Qualifications Service
Craig: I’m really excited for this episode, we have two guests that I’m going to introduce now. First one is Henry Zelikovsky CEO of Softlab360. Hey Henry.
Henry: Hey guys, morning.
Craig: And our second guest is Rob Kirk, founder and CEO of InterGen Data. Hey Rob, what’s up, man.
Rob: Doing well. Great to be here.
Craig: This is an exciting episode. It’s our month of artificial intelligence and machine learning and predictive analytics. So it’s all about that. And that’s what we’re going to talk about on this episode, but real quick before where are you guys calling from, Henry, where are you right now?
Henry: I’m in New York.
Craig: Lovely New York city. And Rob, where are you?
Rob: I’m in Dallas, Texas, home of the Cowboys and the one and only Yankee fan here.
Craig: Must be very lonely. We won’t get into football or baseball, we’ll never stop. So we’re talking AI. And the reason why I had you guys on is because Rob, your firm is based around artificial intelligence, it’s all about data analytics and Henry’s company was the Intel inside of your company and built all the technology that drives all the artificial intelligence machine learning. So I wanna break it down and talk about how that worked. So first, Rob, can you just give us a quick history of InterGen Data and then why you brought Henry in?
Rob: Yeah, absolutely. So thank you, InterGen Data was founded because of a life event that happened to my family, my grandfather we moved him into a long term care facility because of Alzheimer’s and shortly passed away. However, we weren’t prepared as a family for the amount of money emotionally and legally. And I said, who’s worse than myself, I’m in the industry. I’m CIO of a top 40 independent broker dealer, and I couldn’t help my own family. But I kept thinking in the back of my mind, there had to have been a better way to plan for this. There had to have been some way that I needed to find out when something like this typically happens to somebody. And quite frankly, it was real simple. I said, is there data, can I find enough data out there that could actually help me understand when this should happen to somebody, when it could happen to somebody? But I didn’t know how to start searching for AI engines, or engineers or people that could do that. And I certainly didn’t know how to manage them. So I reached out to some various people in the industry and that’s how I got in touch with Henry.
How to Build a Machine Learning Platform
Craig: Excellent. And Henry, when Rob came to you, what did you do and how did you build that very first version of his tech for him?
Henry: We had, by that time, a platform already built, we were offering that service as a service, not as a platform or tool set in the box. We offer that service today. We operate the service ourselves through data science techniques that qualify the data and the problem. And then we do that for our customer. So I was interested in Rob’s business case. It was very unusual, life events driven prediction, and we decided to apply our technology to learning, applying machine learning concepts and capability to learn from the data he wanted to examine to address his question. We decided to offer a proof of concept, to see what we can derive as an insight of the business value from the data he was looking to supply this from.
Craig: And Rob, what is the name of the engine? The life of engine that you guys built?
Rob: So the name of the engine is called Solomon system of life occurrences for managing overcoming and navigating.
Craig: And that’s the engine that Henry built for you?
Rob: Absolutely. He’s built it and we use that every day of our life.
Craig: So Henry, can you talk a little bit about the underlying technology, the math behind it and how it works?
Henry: Yes, our platform is built on the concept of base we implement it. It’s an idea. It’s a directional idea for the algorithm is designed for usual, apply to neural networks. We decided that we’re going to apply that to learn behaviorally. So to say from number of techniques, number of data we have other algorithms applied in the platform, but this is our primary direction. In Rob’s case we had four data sources and he gave us seven business questions to decide, to answer in predictive sense, using our technology. So we use the concept of data slicing. We take the initial income and data sources. We slice the data in variety of ways that we determine to be valuable. This is the time period that our data science team takes work with.
Henry: We filter out what we call statistical noise, the data that is not qualified well to machine learn from, and we continue evolving. We slice the data differently and use a number of different techniques and algorithms to learn from that data, answering the business question Rob wants to get. We do this a number of times, both in classification sense, and clusterization, we mix both our system, his capability to do a merge between the two structures to do ideas. And we idea the point is to see consistency in the results. So if we predict on the same data set being sliced differently from different algorithms, but we achieve the same probability, the result, and we rank probability in five ranks. Once we reach the consistency, we believe what we find. Okay. And then we can go back using different tool set into original data and say, just randomly look for certain life event in a certain personal life category, are we close enough within a variety of probability deviation? If we are answering question as precisely as we can get.
Henry: The data we learn from is historical data, something already known, we just learn from it for future. And once we get new data periodically we relearn, we don’t have to learn every time data changes, but our data science team decides when it’s worth it to learn again from the ground up using the concept of removing statistical loads and eventually we get to consistency that’s reliable. And that’s what once reliability is reached, Rob can use that within his usability case, he can rely on the data to construct his own cases, future cases.
What is Data Slicing?
Craig: So Henry, can you explain what is data slicing?
Henry: Data slicing is that if you think of data that comes to us as human in either binary sense or textual sense, kind of almost like an Excel fashion comma separated to some limited base. We can look at the data row by row, column by column and in mathematical sense, I can look at data diagonally. So it’s the tool set technique that basically extracts from the entire data set a small portion of it. So we slice it based on particular
Craig: Rob, so you got your first platform built, so Henry’s built to do your proof of concept was up and running. So what was the first question you wanted Henry to answer, his team to answer?
Rob: You know, it’s kind of funny you ask that Craig, because our very first question was, can you tell me when somebody makes the most amount of money in their life? I just wanted to know, is there enough data out there to do that? And quite amazingly, he was able to do it. But what startled me was he actually came back with two answers. The first answer was 47. The second answer was 51 and I went to Henry and I said, Hey this, I asked for one answer, why did I get two? And he came back and he said, well, it’s statistically significant and we need to review this. And that’s when he showed me by looking through the data that the first answer is 47 and that’s when 80% of America makes the most amount of money in their life. And it’s roughly around $62,000. And the second answer was 51, but that’s where 20% of America made the most amount in their money in their life. But they make $157,000. It was such a vast difference that I couldn’t ignore it, the data spoke for itself. And it was only through Henry’s machine was I even able to know this that’s what he’s been able to do for us.
Craig: Right? The artificial intelligence machine learning aspect of his system enabled it, not just to come back with one easy answer, but two answers that showed more complexity and nuance in the answers.
Rob: Absolutely. And from there we started finding other specific points of when people would spend money when they would save money. And these little pockets came in 25, 27, at the age of 29, 31, 33, and each one was different. So that’s when we really started to use the power of what Henry’s system could do because it was classifying all of these various people in different ways that we had not known before.
80% of Data Comes from the Government
Craig: So can we talk a little bit about the data you’re using that? What data are you pulling in to generate these answers?
Rob: So we pull in a lot of data from several sets of resources. Most of it about 80% of our data comes from the government. So these are things like the CDC, the bureau of labor and statistics, the FDA, DOJ. The good thing about data from the government is it’s free. The bad thing is it’s really not normalized. It’s not ready for huge data projects and consumption. And then secondly what we found out is a lot of departments would take data from each other and then they would use it and append data to it. So it’s always about going back and forth to find the original source. And roughly I think the government has almost 600,000 data sets, so it’s not easy to go through it, but Henry and his team has been spending well since 2014 we’ve been working on this problem. The other set of data that we get is from private resources. We can go out and buy data from different data vendors from different people that produce their own data. And then we use that collectively in our system.
Storing Massive Amounts of Data
Craig: So Henry, what database technology did you use to be able to store all this massive amount of government and private data and what were some of the challenges you overcame?
Henry: So first our platform sits in, in Amazon AWS cloud. We use variety of data forms in AWS to store income and data. This government level data used that example for a moment, comes as a collection of files. So we can store a variety of files as files as they come to us from the outside. And then through machine learning mechanism, we construct internal binary files that have a meaning to us internally to progress the process. The system for us is distributed. It’s a collection of different processing nodes, both functional algorithmically and data management wise, data management side, preparing data for calculations, which we do with native basin in math and then the resulting set. And then we merge and re regroup on that basis as well.
Henry: The end result, the end result that we prepare for the completion of our learning, we support my SQL Postgres and Oracle Rob received an Oracle version initially. We’re also, I have to say use the Oracle some portions of Oracle technology for other means, but Oracle in this case was the resulting data store. And one thing I think when add to what Rob said earlier is that since the data come from the government it is a little bit unruly, right? Once we extracted data consistently, we found almost by accident, that other data points we chose to use and why we chose that were characteristics of the data that he did not ask us about. But we found it useful was saw consistency of using that. So we put that data aside. We said, we can learn more if we use that, if we can add another question, another dimension to your questions, we can learn a little more.
Henry: And that’s what we’ve been doing since 2014, we incrementally learn more and more, and then it yielded something which we tried to achieve. It gave him the opportunity to learn from the result to ask the next question, the point of our services that you always have a question behind a question. You can ask questions initially as a business starting point, and when you see the results and especially the consistency of these results, you can say, based on that, I have to ask something else. And then you ask that and we continue working at it and then you ask something else. So that’s the value of deriving insight from the data as the rule, as it might be. So your value to the business continues to grow as long as we produce something new without learning.
Analyzing Pockets of Expenses
Craig: Rob. So what was the next big problem you had Henry solve after you realized the data could deliver insights to your business?
Rob: So we started, if we go back to what Henry had just stated, we saw pockets of expenses that kept popping up. So I’ll take myself as an example, someone who’s Native American Indian, married to a Korean woman, you know, I’m in financial services, she’s in communications. This is where we live in Dallas, Texas. All of that, we started seeing correlations being built. So it was the expenses of where we live about buying a home, buying a car, having children, but it was the correlation of how each of our cohorts. So me as a male Native American Indian, her as a Korean Asian woman getting married, we would influence each other on when we would have children. But when we would have children would also be influenced by where we lived, by what we did, by more importantly, some of the other things that were around, how many houses, did we have children? What was the next car? What was the next vehicle? So each of these kept reinforcing themselves, but influencing each other. And so we saw determination of how they would actually predict those next set of events. So it’s not just, Hey, everyone has a child at around the average age of 24, but what are the other preceding cohorts that would make that change either go longer or go shorter? And that was the next greatest question, because then if we did that, then we could really start to map out additional life events.
Craig: Henry, how did you deliver on that? So when he saw those correlations based on geography and based on his background of him and his wife, that how they would influence each other, how did you deliver on that?
Henry: These became additional characteristics to one. We started to take the initial data, initial learning process, we added that criteria. We said, so here is maybe a cross question. It goes across number of characteristics. We’ll learn from the first time we will learn and our data. And then we took a different technique. We said, let’s learn from the result in data. So I take the benefit of not doing from again. We want to see the result of each approach, how it works out. And since we have opportunity in the system to merge results and relearn from the result into a new result, it gave us this extraction, gave us this opportunity to answer the question. And ultimately it brought us to a point where Rob now can say what could happen to me or what I would like to happen to me at this age? Let’s say by age group, because you know, people in my family, certain things happened when I would be 49, 51 or some other age. And is there correlation to people like me by gender, by background, by family size, by family history, what happened to other people in the universe of data that I may need to know about in terms of my saving money for expenses, because something might occur to me that I’m not envisioning. And I don’t know my history too well, but history about the people like me suggests, that was the target.
Predicting Critical Illnesses
Craig: Rob, what was the next big issue that you needed Henry and Softlab360 to help you build?
Rob: So when we first started, it was around wealth related life events, buying homes, buying cars having children, getting married, getting divorced, starting businesses, but we soon realize that the next biggest thing is going to be health. I mean the current statistics today from the bureau, from the BLS bureau of labor and statistics says that this is going to be 75 million people are going to be over the age of 65 and 2030. Everybody’s retiring, but health is something that was really needed, understanding the healthcare. And again, that goes back to my main driver. So we were able to create the life events, but next we had to go that one step level, that one, one more level, one more thing, which is how does health play into this? And it was only after asking Henry and providing the data that we were actually able to come up and start predicting 81 different critical illnesses. And that was just a huge leap forward.
Rob: We were very fortunate that this happened before COVID hit and we are fortunate now because at a time when people are really considering health and the cost of health and how it affects or the retirement even, we’re there, we’re ready to help. And I couldn’t have done it without him. The next step is going to be, how do we then go further? How do we then add all of this together? Not just a single person, but the future that I see is going into it’s a household, it’s me, my wife, my children. It might be an aging parent. I have to take care of, it’s everything. So it’s a different householded view, not just a single individual view.
Craig: So the difference between an individual view and a household view is huge.
Rob: Absolutely massive, absolutely massive. And we see it today. I mean, not just anecdotally, but with people around that, you hear stories from all the time, people getting sick, COVID, heart attacks, cancer, it’s more than just a single event.
Craig: Henry. How did that complicate your problem that Rob wanted to add health related life events?
Henry: Actually, it did not complicate our world this became almost natural for us. It became a natural. If you can see all these occurrences in data, if you can lead us to think what could happen to a person in the way that we think things happen to us, when he started an example that people given a family history, there could be some occurrences of some diseases by some period of time we prepare to handle this given expenses, what we know of, and then the situation is what can we learn from that? When does these things change? And what of adjacent diseases, right? For example, people, as we learn more and more about medicine will learn more and more about discoveries, we say that if statistics, statistical data, some data, some quality data is available to learn from that says, if your family have one occurrence of this disease, what’s the probability that other diseases that the medical science thinks this can cause would occur. What would that look like, when?
Henry: Now you can start saying, okay, so in a household sense, so I need to prepare for that. How do I do that? Given my current situation, given my current job and correlating myself to people similar to me now, correlation similar to me concept can be driven by number of parameters that we learn from. And Rob can use that when he decides on given implementation of his product, in what case, which one fits, right? Some people go by geographical region and they say, what if I move to a better climate? Does that change anything? Different zip code? What does that mean? Right. By the way he tracks zip and zip plus four. So the more data we got our way we can be much more narrow, we can change. And which is why, by the way we choose, when we relearn, once we get zip plus four, it narrows it, it qualifies the universe of people. Now we can say, let’s re, recheck that balance, check that point of view.
Henry: COVID came actually in a way, basically just opened, confirmed the idea of looking ahead, right? Nobody knew COVID would occur, right? But the fact is that if we anticipate that my family history might be sort of hereditary diseases in my family might be in such way that I might be inclined to get sick from other consequential diseases. So after COVID is ending right now, people let’s say that people with diabetes were affected more in certain age, in certain category, not everybody in diabetes affected, right? So this will go using that as an example, in Rob’s case, he can learn from that. It would be other criteria, other criteria. So the fact that his system and now concept can continue evolving what insider draws and he can do it ahead of any particular question anticipating what is the benefit?
Henry: So his product can continue designing life events, life event occurrences, as they might occur from historical standpoint and prepare for other future life events to be expecting, to discuss it with people. So he can go into insurance company and say, you people can design your business products based on this. What let’s see what I learned from the data around me. Okay. You can learn from that inside and build your predictive products, build your insurance products and sample your insurance products. You can offer your constituency these products, would they buy it? If you explain to them what it’s for?
Rob: And if I can, Craig, the thing that I would add there would be his basing approach really helped drive so much more. When COVID hit for us to be able to take in the data around COVID and then say, okay, let’s look at comorbidities. We’ve been monitoring comorbidities, the top 20 comorbidities since the day it started. And we see how certain types of comorbidities would affect or negatively impact certain races. As we know this COVID affected more minorities than it did people who were white, not really an issue, just deal with it. And so what we found out was all these correlations started to bubble up and we started to say, oh, hypertension, type two diabetes, ischemic heart disease. These are problems. And if they occur in one race and one gender more know it, use that data to help yourself to help others. And that was phenomenal. But if he had not taken that basing approach of taking all the prior data, finding out what’s likewise, and then using that for posterior data, we wouldn’t have been able to do that. So this system constantly keeps relearning and showing us new things that we never thought was even possible.
Big Data Qualifications Service
Craig: That’s excellent. And we’re just about out of time, but Henry, can you tell us about your, what I’m referring to as your big data qualifications service? How does that work, and if a firm wants to hire you for that? What would be the process?
Henry: So on our team, we have data science staff that looks at the selected we can help select or customer gives us a data set or a source of data. We look at the data from the eye of what is that data useful for machine learning based on what we expecting to learn? What are my business starting point? Sometimes there is no starting point. We have more cases that people say learn from this data, what you can. So our technique is basically using tools we have in our product. So the data scientists that are understanding what data qualifies for machine learning, they use our own tools to, as I said earlier, slice the data differently and present it to initial algorithms that say this data is learnable, right? I can see the cycle of my machine learning what it produces as an interim result. It’s not the final result.
Henry: And that’s how we use data qualification. We use that first and we explain to our customer what data we might or might not be using. We then apply same idea to say, let’s learn from this data. See the results that initial process of qualifying the data very often helps the customer to determine if they have in their possession enough data to apply machine learning. Would it be useful? Would we be able to learn anything? In corporate world, people use data. They collected over 20, 30 years. And usually in system design, that data is collected by application for which it was built. And some of the structure is not learnable from, okay. So what does happen to us sometimes in case like that is that the customer says, I do want to take benefit of the idea so we can start storing data differently in much more variety of different ways. Let’s take three months to store data differently, come back and relearn from that. Tell me if we can do this, that’s the value of this initial first step qualifying the data. We can help identify the benefit and if it’s not ready yet, then it’s not ready yet. We can do this a little later.
Craig: Of course, right guys, we are out of time. This has been a fascinating discussion. I really appreciate it. So how can people learn more about your companies, Rob, where can people go to learn more about your company?
Rob: Well, first of all, we’re going to be at T3. So stop by booth 704. Come check us out during the T3 advisor conference and the enterprise conference, but more importantly, go to www.intergendata.com
Craig: And Henry your company’s website?
Henry: Our company website is Softlab360.com. And you can find us there and you can see some information about it. And otherwise we can be happy to answer your questions once you have an interesting case to work on.
Craig: Fantastic guys. Thanks so much for being here. Appreciate your time.
Henry: Thanks, Craig. Cheers.