Intelligencia’s Vangelis Vergetis on Building a Successful Drug Pipeline
This week Harry sits down with Vangelis Vergetis, the co-founder and co-executive director of Intelligencia.ai, a startup that uses big data and machine learning to help pharmaceutical companies make better decisions throughout the drug development process. Vergetis argues that if you put a group of pharma executives in a conference room, then add an extra chair for a machine-learning system, the whole group ends up smarter—and able to make more accurate predictions about which drug candidates will succeed and which will fail.
Bringing better analytics into the pharma industry has been an uphill battle, Vergetis says. One survey by McKinsey, his former employer, showed that financial services companies were the most likely to adopt AI and machine learning tools; the least likely were the building and construction trades. But just one rung up from the bottom was healthcare and pharmaceuticals. “The impact that AI could have on health care is “enormous,” Vergetis says. “It’s in the trillions. But in terms of AI adoption, we are right above construction—and no offense to construction, but it’s not the most innovative industry.”
But with the proper data, machine learning algorithms can help drug makers form far more accurate predictions about the probability that a new drug will perform well in Phase I clinical trials, or whether a drug that’s succeeded in Phase I should be advanced to Phase II. “For years we’ve seen the productivity of R&D declining in our space in pharma and biotech, and I refuse to accept that,” Vergetis says. “In the era of a lot of data becoming available, in the era of us being able to use techniques like machine learning to do something with that data, there’s gotta be a way to reverse that trend.”
Please rate and review MoneyBall Medicine on Apple Podcasts! Here’s how to do that from an iPhone, iPad, or iPod touch:
• Launch the “Podcasts” app on your device. If you can’t find this app, swipe all the way to the left on your home screen until you’re on the Search page. Tap the search field at the top and type in “Podcasts.” Apple’s Podcasts app should show up in the search results.
• Tap the Podcasts app icon, and after it opens, tap the Search field at the top, or the little magnifying glass icon in the lower right corner.
Please rate and review MoneyBall Medicine on Apple Podcasts! Here’s how to do that from an iPhone, iPad, or iPod touch:
1.Open the Podcasts app on your iPhone, iPad, or Mac.
2.Navigate to the page of the MoneyBall Medicine podcast. You can find it by searching for it or selecting it from your library. Just note that you’ll have to go to the series page which shows all the episodes, not just the page for a single episode.
3.Scroll down to find the subhead titled “Ratings & Reviews.”
4.Under one of the highlighted reviews, select “Write a Review.”
5.Next, select a star rating at the top — you have the option of choosing between one and five stars.
6.Using the text box at the top, write a title for your review. Then, in the lower text box, write your review. Your review can be up to 300 words long.
7.Once you’ve finished, select “Send” or “Save” in the top-right corner.
8.If you’ve never left a podcast review before, enter a nickname. Your nickname will be displayed next to any reviews you leave from here on out.
9.After selecting a nickname, tap OK. Your review may not be immediately visible.
Harry Glorikian: I’m Harry Glorikian, and this is MoneyBall Medicine, the interview podcast where we meet researchers, entrepreneurs, and physicians who are using the power of data to improve patient health and make healthcare delivery more efficient. You can think of each episode as a new chapter in the never-ending audio version of my 2017 book, “MoneyBall Medicine: Thriving in the New Data-Driven Healthcare Market.” If you like the show, please do us a favor and leave a rating and review at Apple Podcasts.
Harry Glorikian: My guest today is Vangelis Vergetis, the co-founder and co-executive director of Intelligencia. It’s big-data analytics startup focused on the pharmaceutical industry. And the argument Vergetis makes to potential clients is that you can take any group of 10 drug development experts in a conference room, and make them a lot smarter by adding an eleventh chair for a machine-learning system.
Of course, there’s always an art to deciding which drug candidates should advance to clinical trials; which Phase 1 trials should advance to Phase 2; and so on. Decisions that like are risky and expensive, and you can’t make them without having a lot of old-fashioned experience and instinct around the table.
Even so, sometimes the experts are biased and the experience doesn’t apply. And there’s only so much data they humans can keep in their heads. And let’s be honest: if decision makers at the big drug companies were that smart and talented, they’d have more home runs and fewer strikeouts.
Vergetis argues that we’ve got the historical data and the computing power today to make far more informed predictions about which drug programs to push forward. And if more drug companies used those tools, he thinks, it might reverse the decline in R&D productivity.
In the conversation you’re about to hear, we talked about how Vergetis and his co-founder Dimitrios Skaltsas started Intelligencia; how they built their own datasets; how they work with clients; and why it is that he and I think a lot alike—to the point of using the same MoneyBall metaphor when we talk about transforming drug discovery and healthcare.
So here’s my conversation with Vangelis Vergetis.
Harry Glorikian: Vangelis, welcome to the show.
Vangelis Vergetis: Thank you. Very good to be here.
Harry Glorikian: You know, it’s interesting. I was looking at the company and looking at what you guys are doing. And I, I’ve probably talked to, I don’t know, close to 70 experts in different areas of healthcare, drug discovery, computer science you know. Out of all those people, I honestly think you and your company Intelligencia might be the most exact reflection of the argument I was making in my 2017 book MoneyBall Medicine. In fact, I actually think you used the MoneyBall metaphor in your own talks. So I want to start out with having you explain the parallels between your company and what Billy Bean did at the Oakland A’s.
Vangelis Vergetis: it’s very funny. You say this Harry, by the way when we started the company, what is it, three, three and a half years ago now, we had a slide actually. You know, baseball did it in the nineties. Is it about time that healthcare does the same? and going through the MoneyBall analogy. So look, the quick or the easiest way to explain it, right, it’s the analogy of how do you pick baseball players and build a winning baseball team and how do you pick drug candidates and development programs and build a winning pipeline?
So, you know, back in the day, what baseball did is a lot of experts in a big conference room. And these guys have watched—and I say guys, because yeah, they were primarily guys—they watched, you know, thousands of baseball games each, and they had their own perspectives and views and biases and experience in terms of what’s you know, who’s a good baseball player and who’s not, and who they want on the team and how do they complement each other.
And that’s how they built a baseball team and, you know, the, the kid comes in and, you know, the chubby kid, I think Jonah Hill, right, and tells Brad Pitt, or Billy Bean in real life, I think we can do this differently. And that’s a little bit of the analogy here, look, it’s not a perfect analogy, like everything. Right? But the analogy here is how do you go from when you design a clinical trial or when you think about the pros and cons and the risks of a development program, how do you take that conversation from a room full of people, the oncology PhD, the statistician, the person who’s developed dozens of drugs in the past and so on, and you inject some data science and machine learning capability into that conversation. There is art in drug development. We’ll be the first one to acknowledge that, the same way there’s art in baseball. So I would not expect that you know, that room gets replaced by a machine in any shape or form and definitely not in the, in the near or even medium, medium future. But the idea is, you know, if you have 10 people in the room, can you pull up an 11th chair, have the machine learning algorithms, sit at a chair. And provide a very unbiased data-driven perspective into that conversation. So that, that, that’s what we do.
Harry Glorikian: So we’re going to, I want to get into some of the details, but I want to step back and fill in some history here for the people and how Intelligencia got started. If I’m not mistaken, your background is computer science, not biology. Right? Okay. And your co-founder Dimitrios [Skaltsas] is trained in law. So you both spent times at McKinsey, is that where you guys met?
Vangelis Vergetis: So we, it’s a, it’s a good, good both of those good points. So you have a former lawyer—which we don’t hold against him, we still like him very, very much—and a former computer scientist or electrical engineer who are running a company in drug development. Like, how does that work? A couple of things. As you, as you rightly pointed out, we met at McKinsey. We were both part of the healthcare practice there. Initially I was in the, in the US. Dimitrios was in Europe. We met 10 years before starting a company just running client projects together. We kept in touch over the years. And at some point, I think it was 2014, Dimitrios moved to New York, moved to the US with McKinsey and took some AI responsibilities. McKinsey was doing some internal AI. I think it was called McKinsey Solutions or something like that.
So we became closer when he was in New York. We were both in healthcare for the better part of the last a decade, and we were looking for, what is the opportunity? You know, what’s the area in, in drug development or frankly in pharma more broadly, where we believe we can have an impact.
And it was partly us thinking through different areas. It was frankly customers or clients coming. We were both at McKinsey and we have done this study over and over again. Right. How do you design a better clinical trial? We had, I had done this, I don’t know, two dozen times, maybe more. And clients kept asking McKinsey or us, Hey guys, you know, we understand how you do this and you do it very well, but are you using machine learning? Are you using data? And after saying no for about, you know, 50 times we said, okay, we should stop saying no and just go build the damn business. So here we are.
Harry Glorikian: Yeah, no, I know that. I mean, from my days having Scientia Advisors, they ask over and over and over again and you keep it. It’s great profitability by the way, but because you sort of know the answer. But you couldn’t have picked a harder space though this is not a trivial exercise, especially if you go back to 2014 where some of the data was not even truly available or not in a format or not labeled or, or, or, or, or, or—right, to where we are today.
Vangelis Vergetis: We started the company basically in 2018. The biggest challenge, I think you, you, you rightly put it, it’s getting your hands on the right data. You need to answer the question you want to answer. And we took that view by the way. And some people go differently and I’ll have my biases, my own biases, I’ll admit. In a lot of places, what we’ve seen, particularly some big pharma, because they’re sitting on a vast amount of their own data, but whether it’s CTMS data or whatever clinical trial data they have, and the exercise they mentally do is okay, I have all this data. What questions can I answer? What can I do? And there’s a lot of value there. We can answer a lot of good questions. But sometimes the question you ask needs more data than what you have, and you’re kind of force-fitting it a little bit and say, yeah. Okay. But maybe I can answer most of it. Well, not really.
So we flipped it. We asked the question, the question is, what is the risk of this clinical development program or the flip side of it? How likely is it that this clinical program or this drug will eventually reach a patient, will eventually receive approval by the FDA and be used by a patient. Then we went there. We said, okay, if that’s the question, what data do we need to answer that question? Some of it very easily accessible. Some of it doable, but you need to build data pipelines. You need to clean it up. It’s a little bit messy, whatever. Some of it doesn’t exist. We’ve got to build it from scratch. So if you do it the other way and say, what do I have, you’ll ignore that piece that says, doesn’t exist. I have to build this from scratch. You’re going to try to solve the problem with the other stuff.
And then you realize it’s not enough. So we asked the question and then we went very systematically to get all the data we needed to train the machine learning models. To answer that question.
Harry Glorikian: Sounds like a consulting approach. What do we need to fill the two by two? So I totally get it. What are the biggest limitations you see right now from pharma’s current method of assessing clinical trial risk?
Vangelis Vergetis: Yeah, there is, there’s a few and some are bigger. Some are smaller. And it’s, it’s hard to paint the whole industry with a broad brush, but there are some technical limitations that everybody has like as humanity, as a scientific community. Do we really understand drug biology or biology? Really well, human biology. I don’t know. We understand it well enough, but from the, total knowledge, biological knowledge, we probably know this much. That’s one challenge and it’s a technical challenge or a scientific challenge.
A technical challenge is and I think you put your finger on it, data availability. But it goes beyond, can I get my hands on the right data? Is it curated in a particular way? Is it well annotated? Is it labeled? Does it have the same quality? Is it consistent? You know, I, I take data from this genomic database. I pick data from that genomic database. Are they structured the same way? Kind of combine them or how much work do I need to do combine them.
Now, it’s a solvable problem. You know, the understanding of biology. It is solvable over time, but not immediate. The technical aspect of, can I make data consistent, solvable, is incredibly painful, and very few people have the patience for it or are willing to, I mean, we’ve killed a lot of brain cells pulling that data together, but we’ve done it.
And then there’s a third group, I think, of challenges that I would put in the broader, you know, cultural umbrella. You know, there is the, what I call the “every drug is unique” syndrome. A lot of people out there will say, well, you know, there’s so many differences between drugs and programs and all that, there’s no way you can use machine learning to estimate the success of this drug. Most of it not true, actually there’s that syndrome there is the—and it’s actually very interesting in the pharma industry, particularly, or in biotech—here is the “I want to see very quick results. I want to try this AI thing, whatever this AI thing is. Let me try it for two, three months. Show something quick. If I can show us a quick win. Great. If not, I’ll throw it away. I don’t have the patience for it.”
And this is an industry that will easily not even think about investing 10 years and a billion dollars to develop, forget clinical, in the preclinical world, to discover a new target or a new molecule that could cure Alzheimer’s or pancreatic cancer or something. So we are an industry that we’re very much into putting an enormous amount of resources, time, patience, to discover a drug, but when it comes to incorporating an AI system methodology model that may help us tremendously, we are impatient. “Three months. Let’s see what I can do. Oh, no results? Throw it away. I’ll never see it again.”
And there’s a little bit about this, I think in all fairness, companies are getting better. So most of the large pharmas, they have now chief digital officers or chief innovation officers with a whole structure underneath them and mandates and all that. So I don’t want to be too, too pessimistic here. Right. There’s a lot of effort. And I think the industry at the very least has acknowledged they have a cultural barrier that needs to be overcome. But I don’t think we’re fully there in how we overcome it. But we’re making progress,
Harry Glorikian: But it’s interesting, right. I look at existing big pharma and the lumbering ways they sort of move forward in fits and starts. And, you know, do I want to disrupt my kingdom to implement this thing? I mean, there’s, there’s a lot of human psychology that’s involved here and a lack of understanding right. Of fully understanding this and what it can do for them in different areas.
Then I look at the startups that literally from day one are totally data purpose-built right. Everything they’re looking at is, “What’s the data. How do I label it? Where are we going to use it? How do I manipulate it?” I mean, literally it is from the ground up. And I always think to myself sooner or later on my bet is that the startup is going to out maneuver the big guy.
I mean, Google started from as a purpose-built entity and it’s, you know, it, it outstrips most of its competitors and reshapes industries. I always think it’s harder to take an existing entity and reprogram its DNA rather than have a predesigned piece of DNA from, from day one.
Vangelis Vergetis: Harry it’s an incredibly interesting thought, and I don’t have an answer for it. And only time will tell. I would expect some pharma companies, whether we’re talking about big pharma, you know, the big 10 or, you know, the, the massive guys or some of the, you know, in our industry, it’s very funny, like a mid-sized biotech, it’s still a $20 billion business. So, but I would bet some of them, to use your words, will adapt, will reprogram their DNA to some degree, a little bit painfully, it’s going to be a little bit slow or they’re going to have some false starts, but somehow they’ll, they’ll get there. Some others will just buy and we’ve seen this in the industry, right? So, interesting startup, I’ll just buy them. And a few of these have already happened. We’ve seen, what is it, Flatiron was bought by, I believe it was Roche, right? Yes. There’s many other similar examples. That’s probably one of them more, the bigger ones, the more prominent ones.
So I would expect this reprogramming of DNA will not fully happen organically. Some of it will happen by big pharma realizing, “Yeah. We need to play, you know, if we, if we’re not a data company in a few years from now, we’re, we’ll be nowhere, right? How do we get there? Let’s get our stuff stuff organized, and maybe we’ll go make a couple of select acquisitions and eventually we’ll get there.”
So I think all of these flavors will materialize in some shape or form, and some companies will lose. Some companies will do the investments and put the, hire the right people and make the right acquisitions and, and, and they will continue to grow.
Harry Glorikian: Yeah. And I look at it as an analogy to like, if I look at say JP Morgan or Goldman Sachs, I mean, they are the amount of money that they’re spending trying to transition to this new capability is, we’re not spending the same amount of money in pharma for sure. Right? Not even close.
Vangelis Vergetis: I don’t know the actual amount of money, because I haven’t done the analysis. I haven’t seen numbers. But my former employer, McKinsey, has done quite a bit of work. I think it was MGI. So MGI is McKinsey’s think tank, it’s the McKinsey Global Institute. They had done a lot of work on this. And I remember seeing a chart that I thought was, was mind boggling. Areas that are way ahead in AI, or industries that are way ahead in AI, I would say financial services. So the Goldmans and JP Morgans and Morgan Stanleys and some of the world’s high-tech of course, and a few others. Who’s at the bottom? I think it was like building materials or construction, which I get it. Second from the bottom? Health care. It was literally that bad.
Well, it’s true. If you look at the data, the, the sad thing for me the part that we need to think about as an industry, the promise or the impact that AI can have in healthcare. And I’m talking about healthcare more broadly now, including hospitals and payers, not just drug development or a pharma. But the impact that AI can have on health care is enormous. It’s in the trillions. But in terms of AI adoption, we are right above construction and no offense to the construction, but it’s not the most innovative industry.
Harry Glorikian: So, this is why I love investing in this area, because it’s such an incredible, I mean, some of the other opportunities are still incredible, don’t misunderstand me, but this is at its nascent stage in my mind, where the opportunity is dramatic to sort of move the ball forward. Okay. Which brings me to the next question, which is, you know, and you don’t have to name any names or anything like that. Walk us through sort of a real world example of how you help a client in practice.
Vangelis Vergetis: Ooh. Maybe I’ll give you two examples. You asked for one, I’ll give you two. Actually I’m gonna give you more, but let, let’s start with that.
So where do we typically you know, we work with several flavors of customers, right? So we, we serve some of the largest, you know, top five big pharma companies we serve. Some of the smaller, even private biotechs. And we serve a bunch of the mid sized biotechs or midsize pharma companies. One area that that comes or one example is a specific program. So I’ll, I’ll pick on an actual example. So a specific, it’s a phase two asset on a phase two program. It was a combination program, I believe for pancreatic [cancer] that our client was running. It was the phase two. It had been going on for about a year, I want to say. So it was in the middle of phase two, they were starting to see some interim results.
They hadn’t published anything. They were starting to see some interim results, but they were still waiting for the phase three to complete. And then there were basically three questions with increasing degrees of difficulty, if you will. Question number one, how likely is it that this program, so this combo, so our molecule with, I believe it was chemo for pancreatic cancer, will eventually reach a patient, will eventually receive regulatory approval by the FDA? That was question number one, which is our bread and butter. This is what our algorithms do. I’ll make up the number now. It’s a, you know, 13%, which by the way, for pancreatic cancer, phase two, that’s not bad.
The second question was, okay, now let’s start thinking forward. So at the end of phase two, we’re able to show ABC, how does that probability change? Because given the interim results we’ve seen, we have pretty decent conviction we’ll be able to show something in that range when it comes to OS or ORR or whatever end points we’re measuring. What will our probability to change to. It’s 13 now, will it go to 20 or we’ll go to zero?
What if we managed to show something better or something worse. So in that sense, we’re trying to calibrate and say, based on what we show at the end of phase two, how do we make a decision? Should we go to phase three or not? Is it too risky still? And it needs to be derisked further? Or are we comfortable with the risk we’re taking, and we’re willing to write a, you know, $200 million check to run a phase three program.
So we did the simulations, if you will, of the analysis to say, based on what your phase two will show, here’s what you should expect your risk to be at the beginning of phase three. That was the second layer.
The third layer went even a step further and said, okay, let’s assume we are now comfortable moving forward. So the risk is within what we’re willing to take given the size of the prize, right? Because if you do get this drug approved, we estimate an enormous commercial potential. So we’re willing to take significant risks here. How should we do this? So help us think through how different choices for continuing our development program affect our chances for approval.
For example, should we run a smaller phase two-B and then two large phase three trials. Should we scrap the phase two-B and go straight to pivotal phase three and do a much larger trial. And there are different trade offs there that have to do with costs, time and risk. We help them think through from the middle of phase two where they are today, how likely is it that they go approved? How will that evolve once they publish results? And if they decide to move forward, what the best path forward is from a risk point of view. So that’s one example. Well, I’ll spare you. The second one, I spent too long on the first one.
Harry Glorikian: So you’ve written this machine learning model, right? So, and I want to say there’s at least a hundred factors, clinical trial, design outcomes, regulatory process, you know, the biology itself that you mentioned, right? The history. You have to train a model like that. Where did you get the data to train this complex model?
Vangelis Vergetis: There’s no single. So I wish there was. So we we’ve been to now dozens of data sources. So I think what I said at the very beginning, right? Some of the data was easy to get. So for example, there is a bunch of data that clinical trials.gov has. Of course we have that, and everybody else has that. That’s very easy to get right. Valuable, but very easy to get, which is good.
There are some data where you need to, it’s publicly available, but you need to spend a lot of time cleaning up and curating. So think of genomic databases, whether it’s TCGA or GTX, or, you know, dozens of other genomic databases that needs a lot of analysis and lot of processing and a lot of cleanup before you create features out of that data to put in your machine learning algorithms. So that’s a, probably a second group.
And a third group that goes back to the point initially that, you know, not all the data you want to answer, the question, is available. So you have to build it yourself. We built it ourselves. So an example, there is clinical trial outcome. So there is no to our knowledge and we looked hard. There is no data you can buy that has in an incredibly consistent, systematic way, all the outcomes of clinical trials in a particular therapeutic area for the last 20 years.
So let’s say, I mean, I mean, oncology, I’ll give you an example. There’s been a few thousand trials in the last 20 years. Let’s say since 2000, we need to know every end point that this trial measured. How many patients were in each patient cohort or in each arm of the trial. What was the value of that endpoint? What ORR did they achieve? What OS did they achieve? Whatever. When was that?
Because sometimes we say, OS, Overall Survival, well, was it measured at six months or 12 months. One layer more of specificity of exactly how the end point was captured. And then you need the number. How many patients survived at the six month mark or whatever it is. So there’s all that, all that stuff that you need, and then you need it, not just for the trial or the program you’re assessing, that’s easy to do, right? It’s one program. We can get it from the, from the pharma company themselves. We need it for every single trial that has ever succeeded in the past. And for every single trial that has ever failed. That’s how you train a machine learning algorithm. That was very painful.
We have a whole team in Athens, actually. So if the name didn’t give it up, I’m from Greece originally. I’ve been in New York for like 25 years now, but I’m from Greece originally. So a lot of the team is based in Greece and part of that team, they’re a very highly educated team and, you know, PhDs in biology, oncology, immunology, pharmacology, all the ologies. And that team curates in an incredibly systematic way all that data, before our data engineers and before our machine learning team can take over to build models. Right?
So to answer your question in a short way, dozens of data sources, some easy to get some much harder with a lot of processing. And some we had to just create from scratch.
Harry Glorikian: I mean, that was just thinking about what you were saying. That, that last piece we were just discussing. I mean, I can imagine to hospitals and to doctors that would be—if you could put that into interesting matrix, they could get an interesting view into these drugs instead of memorizing off the top of their head. It’s it, you know, I always find all these discussions with companies that have data. I can think of five other things to do easily. Once you’ve got the data source.
Vangelis Vergetis: We’ve been discussing internally, both as a team, but also with our advisors and even our customers at this point where they’re coming to us on the saying, Hey guys, that’s amazing what you have. We’ll pay you money. Can we now do this. Can we now do that. And some of that we would love to do and we’re entertaining it. Some of it, you know, we, we’re still a growing company or, you know, there’s 40 of us total in the company. You also don’t want to get distracted by too many shiny objects. You know, find the right shiny object and focus on a couple of them, but not too many.
So for some of them, we’ll say, look, we could do it. We can, we don’t have the time. We don’t have the bandwidth today. Maybe later. For some of them we would say, yeah, that’s incredibly interesting. And we were planning to go there anyway. Let’s do it faster together. So we’re discussing with one of our customers today about building something that goes beyond risk and starts thinking about the commercial implications of what happens when a drug actually gets approved. So it’s not just predicting approval, but can you predict anything in the commercial space, whether that’s revenue reimbursement market shares and so on.
Harry Glorikian: I want to pause the conversation for a minute to make a quick request.
If you’re a fan of MoneyBall Medicine, you know that we’ve published dozens of interviews with leading scientists and entrepreneurs exploring the boundaries of data-driven healthcare and research. And you can listen to all of those episodes for free at Apple Podcasts, or at my website glorikian.com, or wherever you get your podcasts.
There’s one small thing you can do in return, and that’s to leave a rating and a review of the show on Apple Podcasts. It’s one of the best ways to help other listeners find and follow the show.
If you’ve never posted a review or a rating, it’s easy. All you have to do is open the Apple Podcasts app on your smartphone, search for MoneyBall Medicine, and scroll down to the Ratings & Reviews section. Tap the stars to rate the show, and then tap the link that says Write a Review to leave your comments. It’ll only take a minute, but it’ll help us out immensely. Thank you!
And now back to the show.
Harry Glorikian: If you have it to say, what is your defensible advantage, your special sauce? Like, what is it that you’re doing for pharma that they can’t somehow reproduce for themselves?
Vangelis Vergetis: That’s a great question, Harry. I will say a couple of things. Some are softer, some are harder. On the softer side, and probably more important by the way, is the persistent focus you know, unrelenting pursuit of what we’re here to build. In a larger company, it’s too easy to lose focus, budgets, get cut, people, get reassigned, promoted, change departments, move.
So it’s very hard to get a team together to focus on something for an extended period of time and only do that. So that’s probably one thing when, when you compare it to a larger pharma company, right. The, the second thing would be. Bringing together people with very different expertise and experiences.
So if you go to our office in Athens—and not the last year, given all the mess, we’re all living in with coronavirus—but if you go to our office in Athens either before that, or hopefully very soon, it’s a room and you have, you know, the data scientist is sitting here. The oncology PhD is right next to her. Right across is the data engineer. The drug developer is sitting over there. The statistician is there.
So it’s literally having all those people in one room or in, you know, a series of rooms in one floor, let’s say, where they work together on the same topic. And it sounds a little bit mundane and it sounds a little trite, but it makes a difference for the biologist to be listening into, as these computer scientists or data scientists are talking about their models. And I’m sitting here entering all the biological clinical data from this New England Journal of Medicine article that I’m reading. I actually understand how they use it and I can offer an idea. I can say, Hey, actually, I can capture it in a way that will help you guys given what you’re discussing. So all those things help.
So that’s the second element, which is a team of you know, we use diversity in many ways. So a diverse team, not just in the, in the racial or, or, you know any other perspective, but also in experiences and backgrounds.
And the third one, which is the more technical one. It’s the data we actually do have. It does take an enormous amount of time, a lot of people, an enormous amount of effort to actually build and create the data cube that we have. Nobody else has this. It’s incredibly painful but we’ve done it. So that does set us apart. There are companies out there that are trying to solve the same or very similar questions or answer very similar questions based on a much more limited set of data. And they fall short. They’re okay. But they will short of, of our predictive power. Not because they’re not doing anything wrong, not because they’re not good data scientists, all of those things are fine. They just don’t have the data we have.
Harry Glorikian: And so that brings me to that next question. In all of these models, there there’s little issues, fraught throughout the process…
Vangelis Vergetis: Oh my God. There’s so many. And some of them are longer.
Harry Glorikian: Many, right, that you have to think through. Right. That’s why whenever somebody says, oh yeah, I’ve got the perfect answer, I’m like, it’s impossible. Perfect? No, right. So what is the accuracy? I mean, if you said your predictive algorithm, how do you, how do you, first of all, what do you compare it against? And then let me just pick and say, if I will, putting it against a traditional way of making decisions. How do you measure your accuracy? And then do you go back and look at real world evidence versus the system?
Vangelis Vergetis: Yeah. So we we’ve done a few things that are very interesting. There is a standard metric for machine learning. So let’s not get too technical or I don’t know how technical your audience is. But there’s the AUC, which is Area Under the Curve, which means the area under the ROC curve…whatever, there’s a metric called AUC. It’s pretty much a number between 0.5 and 1. I mean, technically it could be low as 0.5, but that’s a silly, so it’s a number between 0.5 and 1. The higher it is the more predictive your model is. We are in the high eighties, low nineties, which is, which is incredibly predictive for a problem this nuanced and this hard. If you do image recognition and you use deep learning for image recognition, you get close to 0.999.
These are very different problems. So with a standard AUC metric, we score very highly and we’ve compared that with what others have published in literature. And we are higher than at least what we’ve seen published. But by others then you do obvious things, right? So, so what do you do, you say, okay, let me take an example of hundred trials or a hundred programs for which my algorithm predicts that they are, let’s say in the 20 to 30% success.
All right. So my algorithm says all of these hundred fall in the 20 to 30% range. Now let me follow them over time and see what happens. What do you want? Ideally you want 25% of them to succeed, you know, somewhere in the middle. And it most often that’s what happens. So when we say zero to 10 on average, let’s say 7% of them succeed.
When we say 10 to 30 on average, 22% succeed. When we say 30 to 50 on average, 39% succeed. So you do that on a large amount of trials, and then you start gaining confidence that dammit, what this algorithm or what this model is telling me eventually reflects reality. Now, of course, these are averages, right? So there will be trials for which you say 5% and they succeed. Now the obvious thing there to say is, and what we like about this actually, it’s a true probability measure. So 5%, what does it mean? Right. I don’t need to tell you. 5% means one out of 20 should succeed. Otherwise it’s not 5%. If every, if every trial for which you say 5% fails, well, it’s not 5%. It’s zero. So if you say 5%, you should have one out of 20 succeeding. So you want to see that and you do see that, which is good.
Similarly, if you go to a drug developer and you say, you know, 80%, they’ve never heard a higher number in drug development. Those numbers are rarely exist. So 80% to a drug developer means success. Well, no, it means two out of 10 will fail. Right. So you want to see that you run statistical checks, like the bins that I mentioned, Brier scores, AUC. So you run a bunch of statistical tests and you get very high predictive power.
Look, I’ll summarize it like this in the beginning of phase two, which is pretty early in drug development, right? So you still have, five, six years of, of development left ahead of you. The predictive power of our algorithms are about 90%. So we can tell you with 90% confidence that the probability that we give you is the right probability. When we tell you 20 it’s 20, when we tell you it’s 60 it’s 60, we don’t give you a one-zero estimate, we’ll give you a number. And we’re 90% confident on that number.
Harry Glorikian: That’s a pretty bold statement. So I’ll, you know, let’s, let’s think about it here though. Right? So two things, right? Mof this stuff at some point has to be explainable, which is typically an issue in machine learning is the explainability of the model. So how have you designed it in a way where you can be like, yeah. Okay. This is why I got to this answer.
Vangelis Vergetis: It’s a great point. I wish we could do exactly what you said. But we can come close. So a couple of things, culturally, and for the right reasons, if you go, eh in front of the EVP of R&D in a large pharma company or the head of portfolio, whatever, and you tell them the answer is 42, they’re going to throw you out of the room. They want to know, “Where does the 42 coming from? Why are you telling me this? Give me some, I need to know what can I do about it? I need to understand it.” Which it’s very human and it’s also the right thing.
So we run, by design, we run machine learning models that are explainable. And there is explainability work being done in the academic community even for, let’s say deep learning models, which are still much less explainable than a random forest or a KNN or, or something like that. So we run explainable machine learning algorithms. We spend a lot of time on explainability.
And if one goes on our platform or uses our software, if you look at the number and then you literally click on a thing that says, explain to me why, and you see all the features that contribute to that answer and how important each feature is. So the reason I’m telling you that your probability is 42 is because on the positive side—and I’m making it up for a second, right?—a target that’s a gene that’s highly expressed in the tissue. You’re going after let’s say the lung or, or, or the breast or the liver or whatever it is. The cancerous tissue versus the healthy tissue. You’ve designed a very good trial with the right endpoints. It’s well sized with the, the amount of patients you’re putting in. You have a biomarker, which is a good thing, blah, blah. And maybe we’ll also say on the negative side, by the way you know, as a company, you may not have that much experience in this particular disease area. So I’m dinging you a little bit. And the regulator hasn’t said anything special about you, you haven’t received any breakthrough or accelerated approval or anything like that. The gene you picked is highly expressed, but there has been zero, it’s a first in class indication. If it’s a first in class molecule that has been no approvals in the past of that target. So that tells me it’s a little more risky than the 20th PD1 in the market. So it will give you all that.
And people can do two things with that. One, and perhaps less important, but important. It gives them confidence that they understand why the machine is telling something. They can wrap their head around it and they can get more confident, even though I can tell you, yeah, I’ve run the statistics and the predictive power is 90%, you want to be able to understand it. You want to touch it. You want to feel it. You want to understand why? So it does that.
The second thing it does is you might be able to do something about it. So back to the simulation, right? What do we help our customer? I can maybe assess for you what the difference will be if you use the biomarker versus not. If you have a larger trial with another arm or not. If you use this endpoint versus that endpoint. So you may be able to say, okay, I understand that the probability is 42%, but if I change these three things, can I make it 50? And those eight points in PTRS and probability of approval are massive in terms of NPV or whatever, evaluation you use.
Harry Glorikian: That was going to be what I would, one of my next questions is, so you’re doing all this. And so do they always act on the data or in some cases, do they make a different decision based on what the model said?
Vangelis Vergetis: Both. So, and, and the model is not a black or white model, right? It’s not going to tell you do this, or don’t do this, or move to phase three or don’t move to phase two. I’ll give you an example, if you are in oncology if I tell you that this asset has a 80% probability of success versus 60% probably of success. It probably doesn’t matter. You’re going to move ahead. Anyway. It’s high enough and the risk is too low. You might as well do it. So sometimes, you know, at the extreme, it may not make a big difference whether if I tell you it’s a 5% probability versus a 3% probability, do you actually care? It’s pretty damn low.
Now in a lot of cases though, they, they fall somewhere in the gray zone and this is where a lot of other factors come in. So what do we think of that commercial potential. What are our competitors doing? How does it fit broadly with the rest of our pipeline and all of the other assets, both approved and the programs we have out there. So there’s a lot of other considerations that go into making a decision, whether I move to phase three or whether I de-risk it, or you know, what I do.
But for the most part what we’ve seen is our customers act on the information. They are able to take that information, enhance their decision-making process and make at the end of the day, a better decision either because they stopped something they should have stopped, they progressed something they should have progressed, or they designed the trial a little bit differently, or they you know, put a program in place that maximizes the potential of the asset they have in their pipeline.
So all of those things happen. The last thing I’ll say, Harry, and this one is where we see a lot of action as well, is in business development. So while most of our, we’re not, most actually, a lot of our work is in R&D. So pharma companies developing their own molecules. We see two more areas where this approach is gaining a lot of steam.
Actually one is business development. So as I’m looking not for my own pipeline, but as I’m looking to identify or attract programs out there that I may want to go buy or partner with or in-license and do all sorts of things. So we work with a customer early on phase one and they said, you know, what are the innovative, if you will, first-in-class assets in phase one, so risky stuff for a particular indication, RA or IBD or Parkinson’s or pancreatic cancer, whatever it is for the indication that I care about, what are the phase one programs out there that one are scientifically innovative. So I don’t want the me-too drugs. I don’t want the 21st PD1 in the market, but I want something innovative. And two, can I see that list ranked from a risk point of view or from an attractiveness point of view, you know, some have a 2% chance of approval. Some have a 20% chance of approval. Well, I want to talk about the 20.
Yes. And we’ve, we’ve helped customers identify molecules and programs like that, where they go and they have a conversation with a biotech in south San Francisco or in Zurich, Switzerland, or in Tokyo or wherever, with that biotech about in-licensing or partnerships or acquisitions or whatever it is. So with that we’ve seen quite a bit of action.
Harry Glorikian: Machine learning takes hold in drug development. What’s the big picture outcome. What do you think, you know, how do you think…is it the Intelligencias of the world that are going to change the dynamic? Is it going to be the companies themselves? You know, I believe this is going to have a profound impact on how things are done and what goes forward.
Vangelis Vergetis: Here’s what I’d love to see Harry, I’d love to see… For years we’ve seen—and there’s some change recently—we’ve seen the productivity of R&D declining in our space in pharma and biotech. I refuse to accept that. In the era of a lot of data becoming available, in the era of us being able to use techniques like machine learning, to do something with that data, there’s gotta be a way to reverse that trend, that declining trend in R&D productivity, and see it going up again. Who benefits? Patients, where they see better drugs reaching them faster and curing disease. And of course the broader community of pharma companies, biotechnology companies and so on. So the, the big picture is I’d love to see the productivity of R&D in our space increase.
And AI, whether it’s Intelligencia—and I’m hoping, and I’m sure we will, but there we’ll be honest there and that’s great. We all need to think through, you know, how do we reverse the trend? So in, in pharma or, or in drug development, I see that as the big picture you know, how do I pick the winners? How do I invest behind the winners? How do I make sure I don’t create any, you know, biases in that way where I miss some of the drugs that would have existed had I made the right choice and make my R&D dollars and R&D hours and effort much more productive at the end of the day for delivering drugs to people that need them.
Harry Glorikian: So I saw you were quoted in a report from a law firm called Orrick that I liked. I think you were paraphrasing Derek Lowe from Novartis where you said, “It is not that AI will replace drug developers. It’s that the drug developers who use AI will replace those who don’t.” And coming back to the beginning, you know, do you think this is happening across the board in all businesses? Whether it’s on experimental drugs or winning baseball teams.
Vangelis Vergetis: Yeah. So it’s a great question. Look, I think it is happening across all industries but each industry is different. So I think the scale of impact and the scale of adoption to date are very different across industries.
We talked about, you know, we used construction as an example earlier. If you think about construction, the impact that AI will have a construction, it’s not zero. I know one, a friend and a mentor runs a cement business and their AI. I’m not joking. They’re using AI in cement production to make it more environmentally friendly, increased productivity, increased—he’ll do all those things. So yeah, there will be impact. But it’s going to be less in construction and building materials than it is in healthcare.
Or it’s going to be built different in, in, in financial services, let’s say that, than it is in travel and tourism. Again there are opportunities for machine learning in travel and tourism. Probably less than in banking or financial services broadly or healthcare.
To attempt to answer your question, because I don’t know, I don’t know what the answer is, I can tell you what my bias is or my view. Yes, it will be used across industries, but the scale of impact will be materially different, whether you’re in healthcare or in travel.
And two, the adoption to date is very different. All this excitement about AI and all this energy and all this impact that it can have, it’s fantastic, and it will have it, but let’s also be thoughtful here. I think we all are. But you need experts. There’s a lot of art and a lot of things that happen. There’s art in drug development. There is art in baseball, there’s art, in a lot of things. There is instincts, gut feels that humans have. Some of it is bad because it’s biased, but some of…he didn’t miss it. There’s decisions that doctors make every day as they treat patients. Forget drug development, that yes, that can be made better by AI. Maybe they can be guided by AI, but I’m not sure an AI will take over a physician’s job and anytime soon.
Harry Glorikian: No, I mean, I think the two together always, at least right now, will equate to step wise function up, right? The AI may not miss a piece of data that the physician didn’t see. I’ve been with physicians where they call it and they were missing a piece of data. Had they had that data, that decision would have been different. The machine isn’t going to miss that last piece, right, necessarily. And so I think the two together can be much more powerful than any one alone per se.
Vangelis Vergetis: Yeah. And it varies a lot by the use case, meaning can a machine read a lung image or can it tell me if this picture is a dog or a cat? Yeah. Probably can do it better than a human or, or equally good, equally well. But in use cases that are much more intricate than, you know, reading looking at an image, whether it’s building a baseball team or designing a phase three trial or anything approaching that level of complexity, the two need to come together and will for a long time to come. So I think Derek is right in that sense. Yeah. If, you know, the ones that use drug development will replace the ones that don’t, but AI by itself is not going to replace everybody. Not anytime soon.
Harry Glorikian: Yep. I agree. Well, listen, it was great to speak to you. I look forward to continuing our conversation, because I can see that there’s many areas of overlap. And it’s been great.
Vangelis Vergetis: Thank you, Harry. I appreciate it.
Harry Glorikian: Thank you.
Vangelis Vergetis: Bye.
Harry Glorikian: That’s it for this week’s show. You can find past episodes of MoneyBall Medicine at my website, glorikian.com, under the tab “Podcast.” And you can follow me on Twitter at hglorikian. Thanks for listening, and we’ll be back soon with our next interview.