- More Than Just Sustainable Data
- A Sustainability Framework
- Avoiding Biased Data
Craig: All right, I’m happy to introduce our two guests for this episode, both from the firm clarity AI. The first is Lydia Lawrie, who’s Strategy Manager for Clarity AI.
Lydia: All right. Thanks for having me.
Craig: And her co-guest for this episode is Ron Potok, head of data science for Clarity AI.
Ron: Hello. Good to see you.
Craig: It is good to see both you guys I’m glad you could make it, I’m glad we’ve managed to coordinate this this episode, I’m happy you’re here. We were came out of the T3 conference when Lydia and I met and this seemed like a great bit of a conversation at T3 over beers at the party. That was fun, and we learn a lot more about what you guys do. I thought it’d be great to have you on the podcast to talk more about the technology behind Clarity AI and what you guys are doing. So let’s kick it off. Could you please give us Lydia a 30-second elevator pitch for Clarity AI?
Lydia: Okay, sure. I’ll give it from my viewpoint coming from the wealth management industry. So if you think about sustainability, it’s complex and there are a lot of ways to approach it. So the core mission of clarity AI is a sustainability technology platform that enables transparency. So instead of just a score, our main goal is to allow our users across different types to be able to configure solutions or making sustainability decisions better. And Ron will definitely jump into a little bit more about how we do that from the data side as well as the methodology side.
More than just Sustainable Data
Craig: And that’s why Ron is here. That’s why I like that you have two people so you can bounce things around whoever is strongest can take that particular part of it. So you mentioned you’re a sustainability data company. Can you explain what that means and why you’re actually more than just a sustainable data company?
Lydia: Yes. So I would say beyond data, it’s also about the expertise of sustainability. So we have humans in the loop across the process. But I would say one of the most interesting is the way that we apply that methodology to our very broad, deep coverage set of data. So from that standpoint, you can use that data but also solve a lot of different questions and needs for our clients, whether you’re an asset manager trying to confirm a mandate for a particular fund that you’re using, or whether you’re an investor trying to understand the underlying impact your portfolio is making. So that’s what I mean beyond data. We’re actually taking sustainability expertise and providing it in a way that’s transparent and usable across different technology integrations, whether it’s a web app or an API or widget.
Craig: That’s excellent. And transparency is important, especially when it comes to becomes any kind of data. But we were talking earlier about also the issues around specifically ESG data. So why is ESG data so difficult to access?
Ron: Sure, I can try to handle that. So ESG data is part of this big alternative financial data ecosystem, right. And with ESG data in particular, we know that it suffers from a lot of similar problems and other alternative data sources lack of standardization, it tends to be unstructured in nature. Data either exists in company or tends to exist in company reports or a new sources, various other unstructured layers. And so that’s part of the value we give is pulling all of that data together in an objective manner, normalizing it and standardizing it. So you can compare apples to apples rather than apples to oranges. And part of the way in which we do that is efficiently using AI and that allows us to gather data objectively, efficiently, and then combine it together so that we can have an accurate sustainability database so that we can move it down the line into sustainability frameworks and finally display to you so that we you can understand what sustainability means with respect to the various companies in our universe.
A Sustainability Framework
Craig: Can you explain what is a sustainability framework?
Ron: Sure. So sustainability data right now ow is tends to be defined in both environmental and social needs. And so environmental, you might think of the amount of CO2 scope one emissions, or the amount of CO2 emissions or the amount of water waste. Tends to be quantitative in nature and measured in tonnes or something like that. Sustainability framework allows you to add carbon emissions to other more quantitative or other environmental metrics like water waste, or to social metrics, like gender diversity of your workforce, is there a gender pay gap? Do you have bribery and corruption? Are you violating bribery and corruption standards? Those kind that’s what a sustainability framework allows you to do is to aggregate that different type of information. into a single number. And one thing we try to do is provide you with different frameworks and then customizability within those frameworks.
Lydia: And I’ll add another anecdote on that. So if you think in the financial world, imagine if everyone decided what to report to the SEC. It’d be really hard as a financial analyst to compare apples to apples. So although there is not one global standard, there are a few really important frameworks that are not only popular, let’s use the United Nations as one. But what makes them even more important is when governments regulate off of those frameworks. So all of a sudden, there’s a lot more data coverage on those metrics. And now we can actually compare on who’s doing a better job versus a worse job.
Craig: That’s back to the standardization of data which you guys have to do now because there isn’t any government framework around that.
Ron: Yeah, there are starting to be in Europe, especially but you’re right. There are no strong regulations right now.
Craig: And we have all this data coming into your systems into your frameworks, from many many data sources, how many sources would you say you collect data from?
Ron: So in terms of data providers, we have over 80 different data providers that we purchase data from, and then we also have new sources. So we collect over 100,000 news articles a day, that allow us to analyze whether companies are behaving according to global norms. And simultaneously we collect a lot of the data from company published reports. So from company websites, etc, collecting data from there because as you know, today, a lot of sustainability data is self reported sustainability data. We’re simultaneously getting into additional third party measurements. So that’s another there are many different technologies coming online today that allow kind of third parties to measure emissions, etc, from various companies. You can think of satellites, new technologies like that that are becoming available that allow us to deeper enrich and increase the accuracy of the data that we have and can provide.
Craig: With all this data coming in how is Clarity AI aggregating all this disparate data?
Ron: So it starts from the beginning kind of objective collection, and so AI has some benefits in terms of data collection. It allows us to scale to over 30,000 companies very easily. As you know, computers are much more scalable than people and computers are very good at reading today. And so we leverage that technology in order to greatly expand our coverage and provide objective analysis.
Ron: So when a computer reads an article, when it reads the headline of New York Times and it says Facebook, we automatically as people, we automatically assume this must be another data rights or fake news or other controversy. A computer doesn’t naturally assume that and so that’s a benefit of using computers. Is they can be objective and then apply the same rules over and over again, that leads to standardization in a lot of ways.
Ron: We also care for normalizing units and things things that other companies can struggle with, normalizing the units, making sure we’re comparing apples to apples like I mentioned earlier, and that allows us to bring in bring into a normalized standard lot standardized sustainability database that then allow us to apply these sustainability frameworks like Lydia mentioned, in order to really compare companies together on a sustainability in on the sustainability axis. And I think that’s an important like, we’re experts in sustainability and I just say we are we integrate with others like Aladdin in order to provide that financial expertise. So then you can combine sustainability with finance.
Avoiding Biased Data
Craig: But Ron, isn’t it true that an algorithm can only be as objective as the programmers who are writing it so if the programmers writing it are not objective or they have biases that they don’t want to admit to then those biases can creep into the code?
Ron: You bet, that is very true. And so that’s something we actually noticed in our AI algorithms. So we leverage the latest technologies, and as you may know, there has been breakthroughs in NLP both. So we’re actually leveraging Facebook’s Roberta model. What we noticed is, is that actually, Facebook is in the vocabulary of the Roberta model. And we noticed that when it read an article when it read the word Facebook it actually associated data privacy with it, controversies with it, because it had been trained on the corpus of news over the last 10 years. And guess what, like that it associated.
Ron: So mistakenly, Facebook has associated itself in its model. So what we do is we protect against that. So what we do is we actually use a program prior to that to mask all the company names. So when the computer that’s analyzing the article gets the article, instead of seeing Facebook, it sees company, the word “company”, so it doesn’t know. So you’re right. There is subjectivity there is always some form of subjectivity, but it’s much easier for us to try to undo what we know about and that’s just a case of us trying to undo what Facebook mistakenly did.
Craig: Sure, that’s an excellent example. Going back to how you aggregate the disparate data from 80, or more data providers, 100,000 news articles in the published company reports. When you’re doing that, you said it’s a flexible and customizable in terms of the frameworks that companies can use your data and you said you’ve built a number of default configurations. So can you talk about those and why I’d pick one over the other?
Ron: Sure. So there are several different popular frameworks. One such framework, the most popular today is ESG risk. And what that is, is the financial risks, and this one has become popular or last week with Elon Musk’s tweets about ESG risk and MSCI ESG risk, what ESG risk is the financial risk associated with your company according to these environmental, social and governance keys. The standard way to report that risk, score that risk is best in class. And so you’re comparing car companies against other car companies and ranking 0-100 or 1-100 typically. Other frameworks may not use best in industry, right? They may use best in universe and so that’s actually a different choice, you can customize and decide to use best in the universe. And in that case, Tesla, even though Tesla suffers from some controversies in the news associated with the way it treats its employees, it can rank better because it’s not comparing against other just other car companies. That’s how Exxon or oil companies, there’s still has to be a best in class in the oil companies. You can’t penalize them all because you’re comparing against other oil companies. There are different frameworks.
Ron: And so we leverage like Lydia was mentioning earlier, instead of ESG risk, we also offer frameworks like UN Sustainable Development Goals. With our UN Sustainable Development Goals, we don’t use a best in class best in industry framework. We actually use a best in universe framework. And so this will more fairly compare, if you do think that different industries should be penalized differently because they affect the world differently. This will appropriately penalize those industries, if that makes any sense.
Lydia: And what I’ll add one thing there too, that I think is interesting example also around data is that in our web app, you can actually choose the threshold of data relevance you want. So let’s say we don’t have enough coverage up to 30% on a specific metrics, you can exclude that from your analysis, for instance. So there are ways that we can customize that framework based on your needs.
Craig: I like that, because there’s so many data points we’re talking about, and I know for example, MSCI, which you just mentioned has, I think, 35 or 45 data points that they use when they’re ranking public companies, how many data points do you guys use?
Ron: So that is again, customizable, but our default, so this is for ESG risk, which is each framework has its number of metrics that we take into account. I believe, with the Sustainable Development Goals, I believe we have around 40, for the ESG risk, which is similar to MSCI’s ESG risk it’s around 120. It depends on the industry, but we have about 120 different metrics that we have.
Craig: So 120 different metrics. So how do you avoid compiling errors because if each metric is measured in what could be subjective fashion because some of them are very subjective measurements, and then each measurement has some sort of margin of error? Won’t they just compile on top of each other? So now you have 120 times as much error?
Ron: One thing that you get aided by is these metrics are pretty independent of each other. So an error in one doesn’t necessarily add to an era of what we call an uncorrelated data. The fact that we’re adding social metrics, they may be very limited correlation with those sorts of social metrics to and that’s what we actually find. That social metrics tend to be fairly uncorrelated with environmental metrics. And so you don’t get a standard kind of compiling, adding adding errors together, you get a little protection, but that doesn’t mean that there isn’t errors in the data and we’re constantly working on what we call the accuracy of our data, constantly refining our methodologies and refining our data, refining our models so that we can present the closest to apples to apples we can.
Ron: Yeah, and Ron on that point. I also want to highlight the number of people we have working on the data teams. So at Clarity were around 250 people, I was clicking through our HR internal system, just adding up all the numbers. There are over 100 people on the data teams alone. So even if you think about our tech experts that are sitting in product management, that’s even an entire separate area than the folks only focus on this process that Ron just went over. So I think that’s that’s an important piece to credentialize.
Craig: Excellent. It’s good to know there are people working behind the scenes to get the data moving.
Ron: It moves towards our mission is to provide sustainability information to the markets and information is only as valuable as people trust it. And that’s really something that we want to gain in the market is trust that our data is backed by like by documents by like by a source of truth. And you can see that transparency, and you can see where the data is coming from so we’re trying to gain that trust that this information you can leverage this information to make better decisions, sustainably, sustainable decisions.
Craig: There were three measurements we were talking about around your company’s data. One was data sources, one was coverage and one was estimation models. How do you measure coverage and what how wide is your coverage?
Ron: Sure, so we mainly speak in coverage in terms of public companies, and so we have coverage of over 40,000 publicly traded companies in many different exchanges around the world and continually increasing. So that’s how we attempt to provide the broadest range of coverage to the marketplace. Now we also are beginning to integrate more closely with Aladdin different in order to provide coverage to private markets as well. So that’s another exciting extension of our work in order to provide indications of how would private companies perform if they were to move into the public markets?
Craig: Ron, you’re jumping ahead, you didn’t give me a chance to ask you a question about Aladdin.
Lydia: Before I even let you jump ahead. I’ll add that something that Clarity AI does differently, is not only do we have the coverage of the underlying securities, but we actually look through funds. So we’ll be looking at our 40,000 coverage of securities that actually rolls up to us providing the same data on 220,000 funds. So I think that’s also an important concept for the wealth management audience as they think about what their clients and advisors are using to manage assets today.
Craig: That is important. And can we dive a little bit more into the technology? So you’ve mentioned NLP a number of times, Ron, what is the underlying tech you guys are using for your NLP process?
Ron: Sure. So earlier, I mentioned that there has been great advances in the last five years. Google started with the Byrd transformer models and I mentioned earlier that we tend to leverage Roberta, Facebook’s, but we have also been working with open AI on GPT3so kind of the latest and greatest transformer models are what we’re applying in production every day. So we have these models reading hundreds of thousands of documents news articles every day.
Craig: That’s excellent. And this enables you to consume unstructured data like news articles and company reports. What are some of the difficulties you’ve had when it comes to analyzing this kind of data?
Ron: So there are technology challenges with with applying deep deep learning models are not your standard everyday models. People are still learning how to use them what they’re good at and what they’re bad at. So I think that’s one of the from a data scientist perspective, one of the fun aspects is leveraging someone else spent hundreds of thousands of dollars training these deep learning models learning what they’re good and bad at. And that’s something that we’ve really applying this applying the knowledge.
Ron: The other one is, something we know is data scientists is all models are wrong to some models are less wrong. But it’s especially in our audience of sophisticated investors. Accuracy like quality, high quality matters. And so one thing we’ve learned very early on is that we need even though we leverage models we need to output to be as close to 100% accurate as we can get. And so we’ve instituted throughout our processes, sustainability experts look at the results every day, ensure that the models are performing at the highest level of performance and if not they get penalized for that. And so I think that’s one of the important aspects is the validation process, the human in the loop validation process of the models. That’s something that’s intrinsic in our organization.
Collecting Private ESG Data
Craig: All models are wrong. Some models are less wrong. So Ron mentioned alternative investments and how you’re collecting data on private companies. How does that work? And how are you able to compare how private companies would score in ESG and what is it the benefit for investors?
Lydia: So I’ll take the last part first, and that is if you look across the investable universe, more and more assets are flowing into private markets, and not even necessarily only ultra high net worth individuals. It’s even going lower in the investment spectrum. And so it is an important part of investors portfolios. So with our integration with eFront it helps both the public and private markets talk to each other and learn what those what those aspects are, and that’s where I’ll hand it over to Ron to focus more on how that works.
Ron: Sure. So I think first starting with we today we don’t collect data on private companies. Our integration with eFront and BlackRock and Aladdin is to allow Blackrock Aladdin eFront to compare information eFront has on private companies, to public companies. So it allows you to say, how would this company behave if it were a public company?
Craig: So you don’t collect any data, but you are integrating with eFront.
Ron: Exactly, exactly. Exactly.
Craig: So are you bringing data in from eFront?
Ron: Yes, eFront, so data flows within Aladdin so we provide the benchmarks and in the ranking of private companies are distilled on to where would they rank if they were a public company.
Craig: That seems really, really helpful. Exactly. Moving on to another use case. You’re also working with Addepar. Can you talk about what you’re doing with that?
Lydia: Yeah, sure. So with Addepar it tends to be more of a wealth management use case. So it’s an advisors workstation. So what we’ve integrated there is our ESG impact data set so they will be able to see and pull in, alongside other important metrics that that advisor is used to looking at and explaining to their client they’ll be able to pull in those high level E, S, G, and ESG total scores across their portfolio, but something that we’re working with Addepar on now is what else what next we have various solutions, whether it’s a focus on climate change, whether it’s a focus on values, preferences and how that translates into an advisors portfolio. So I did I just step one, but still, much more in the roadmap there. But also happy to go back and share a little bit more about Aladdin as well.
Craig: Go ahead. We have a couple more minutes. What else you want to say about Aladdin?
Lydia: Well, I’ll just go back to that point of trust and how important getting the data right and transparent builds trust with our clients. I also think Aladdin as our as a very important partner. Also an investor in our company demonstrates a lot of that trust is Blackrock being the largest asset manager. Aladdin being one of the most widely used software across the financial industry. So then entrusting our data into their system, I think speaks volumes. And right now, that integration is primarily focused around asset managers. So we are giving them universal API’s across our different solutions. However, it doesn’t stop there. We’ve also integrated widgets that their clients are finding very, very helpful. So we have those those designers in help house helping translate that as well as template views. So when we think about integration, it’s it’s not just here’s the data, it’s here’s how your asset managers and users of your system can better use this data to apply it to their needs.
Craig: Now, we’re talking about something that we’re really interested in. How can I if I’m a broker dealer or enterprise wealth management firm, what can I do with your API’s and and what can I use them for and what kind of data can get out of them?
Lydia: Sure, well, so you can get the data on our universal coverage, and then apply that to your point in time decision making. So you can put that up against your other suitability metrics or mandates. You’re managing around either a client’s specific investment policy statement, or a products mandate. So taking that and integrating the data, and you can also pull different API’s that are focused let’s say one on climate they want on UN SDGs Sustainable Development Goals, and then one on impact one on risk, as well as regulation. So I know that you you mentioned we don’t have too much regulation in the US. It is alive and strong and really leading a lot of this demand, both on the asset manager, as well as the wealth manager site today already in Europe. So I think that’s going to continue to be an important theme for what you’re able to accomplish through our API integrations and in our solutions.
Craig: That’s excellent. And there was one other use case you guys want to talk about a top 10 global bank, what have you done for them?
Lydia: Sure, I’ll touch briefly on this idea that it’s interesting the way that we’re integrated, because initially it was through our web app, and then it was through our API’s through Aladdin, and they were using that more in the asset management function. However, we’re also talking with them from a wealth management approach and more of a reporting function. So a lot of large organizations today have both types of functions within their ecosystem. So being able to flexibly play in those multiple systems, and having having the direct integration across the business has been very helpful. In addition to our partnership with Aladdin in in solving solving their needs. Ron, if you if you want to chime in on on that top 10 large global bank feel free because you were involved in that.
Ron: Sure I think the interesting use case that they had was customization of strategies. So they wanted to they have they had an investment strategy that they wanted to implement that involved exposures, exposures to various like weapons manufacturers, cigarettes, alcohol, those kinds of values investments, combined with ESG risk and combined with controversies not wanting to have environmental controversies and so they set up unique strategies customized strategies to the to the goals of their clients, and again, do that within within Clarity and apply those strategies and now they make decisions upon with those strategies.
Craig: That’s excellent. I can see how that’d be really helpful for a lot of their investing and a lot of their whether benefiting the clients and helping them build portfolios that more aligned with their goals. And I think that is all the time we have. Can you guys tell us where people listening can find out more about Clarity AI?
Ron: Yes, you can find us on our website, which is actually Clarity.ai and you can get ahold of us through the website as well as learn much more about what we’re focused on and who our clients are and what our solutions are.
Craig: Great, guys, thanks so much for being on the program. Really appreciate it.
Lydia: Thanks, Craig.
Ron: Thank you.