Eric Daimler at Conexus says Forget Calculus, Today’s Coders Need to Know Category Theory
Harry’s guest Eric Daimler, a serial software entrepreneur and a former Presidential Innovation Fellow in the Obama Administration, has an interesting argument about math. If you’re a young person today trying to decide which math course you’re going to take—or maybe an old person who just wants to brush up—he says you shouldn’t bother with trigonometry or calculus. Instead he says you should study category theory. An increasingly important in computer science, category theory is about the relationships between sets or structures. It can be used to prove that different structures are consistent or compatible with one another, and to prove that the relationships in a dataset are still intact even after the data has been transformed in some way. Together with two former MIT mathematicians, Daimler co-founded a company called Conexus that uses category theory in context to tackle the problem of data interoperability.
Longtime listeners know that data interoperability in healthcare, or more often the lack of interoperability, is a repeating theme of the show. In fields from drug development to frontline medical care, we’ve got petabytes of data to work with, in the form of electronic medical records, genomic and proteomic data, and clinical trial data. That data could be the fuel for machine learning and other kinds of computation that could help us make develop drugs faster and make smarter decisions about care. The problem is, it’s all stored in different databases and formats that can’t be safely merged without a nightmarish amount of work. So when someone like Daimler says they have a way to use math to bring heterogeneous data together without compromising that data’s integrity – well, it’s time to pay attention. That’s why on today’s show, we’re all going back to school for an introductory class in category theory.
Please rate and review The Harry Glorikian Show on Apple Podcasts! Here’s how to do that from an iPhone, iPad, or iPod touch:
1. Open the Podcasts app on your iPhone, iPad, or Mac.
2. Navigate to The Harry Glorikian Show podcast. You can find it by searching for it or selecting it from your library. Just note that you’ll have to go to the series page which shows all the episodes, not just the page for a single episode.
3. Scroll down to find the subhead titled “Ratings & Reviews.”
4. Under one of the highlighted reviews, select “Write a Review.”
5. Next, select a star rating at the top — you have the option of choosing between one and five stars.
6. Using the text box at the top, write a title for your review. Then, in the lower text box, write your review. Your review can be up to 300 words long.
7. Once you’ve finished, select “Send” or “Save” in the top-right corner.
8. If you’ve never left a podcast review before, enter a nickname. Your nickname will be displayed next to any reviews you leave from here on out.
9. After selecting a nickname, tap OK. Your review may not be immediately visible.
That’s it! Thanks so much.
Harry Glorikian: Hello. I’m Harry Glorikian, and this is The Harry Glorikian Show, where we explore how technology is changing everything we know about healthcare.
My guest today is Eric Daimler, a serial software entrepreneur and a former Presidential Innovation Fellow in the Obama Administration.
And he has an interesting argument about math.
Daimler says if you’re a young person today trying to decide which math course you’re going to take, or maybe an old person who just wants to brush up, you shouldn’t bother with trigonometry or calculus.
Instead he says you should study category theory.
That’s a field that isn’t even part of the curriculum at most high schools.
But it’s increasingly important in computer science.
Category theory is about the relationships between sets or structures.
It can be used to prove that different structures are consistent or compatible with one another, and to prove that the relationships in a dataset are still intact even after you’ve transformed that data in some way.
Together with two former MIT mathematicians, Daimler co-founded a company called Conexus that uses category theory to tackle the problem of data interoperability.
Now…longtime listeners of the show know that data interoperability in healthcare, or more often the lack of interoperability, is one of my biggest hobby horses.
In fields from drug development to frontline medical care, we’ve got petabytes of data to work with, in the form of electronic medical records, genomic and proteomic data, and clinical trial data.
That data could be the fuel for machine learning and other kinds of computation that could help us make develop drugs faster and make smarter decisions about care.
The problem is, it’s all stored in different databases and formats that can’t be safely merged without a nightmarish amount of work.
So when someone like Daimler says they have a way to use math to bring heterogeneous data together without compromising that data’s integrity – well, I pay attention.
So on today’s show, we’re all going back to school for an introductory class in category theory from Conexus CEO Eric Daimler.
Harry Glorikian: Eric, welcome to the show.
Eric Daimler: It’s great to be here.
Harry Glorikian: So I was reading your varied background. I mean, you’ve worked in so many different kinds of organizations. I’m not sure that there is a compact way or even an accurate way to describe you. So can you describe yourself? You know, what do you do and what are your main interest areas?
Eric Daimler: Yeah, I mean, the easiest way to describe me might come from my mother. Well, where, you know, somebody asked her, is that the doctor? And she says, Well, yes, but he’s not the type that helps people. So I you know, I’ve been doing research around artificial intelligence and I from a lot of different perspectives around my research in graph theory and machine learning and computational linguistics. I’ve been a venture capitalist on Sand Hill Road. I’ve done entrepreneurship, done entrepreneurship, and I started a couple of businesses which I’m doing now. And most notably I was doing policy in Washington, D.C. is part of the Obama administration for a time. So I am often known for that last part. But my background really is rare, if not unique, for having the exposure to AI from all of those angles, from business, academia and policy.
Harry Glorikian: Yeah. I mean, I was looking at the obviously the like you said, the one thing that jumped out to me was the you were a Presidential Innovation Fellow in the Obama administration in 2016. Can you can you give listeners an idea of what is what is the Presidential Innovation Fellowship Program? You know, who are the types of people that are fellows and what kind of things do they do?
Eric Daimler: Sure, it was I guess with that sort of question, it’s helpful then to give a broader picture, even how it started. There was a a program started during the Nixon administration that’s colloquially known as the Science Advisers to the President, you know, a bipartisan group to give science advice to the president that that’s called the OSTP, Office of Science and Technology Policy. There are experts within that group that know know everything from space to cancer, to be super specific to, in my domain, computer security. And I was the authority that was the sole authority during my time in artificial intelligence. So there are other people with other expertise there. There are people in different capacities. You know, I had the particular capacity, I had the particular title that I had that was a one year term. The staffing for these things goes up and down, depending on the administration in ways that you might be able to predict and guess. The people with those titles also also find themselves in different parts of the the executive branch. So they will do a variety of things that are not predicted by the the title of the fellow. My particular role that I happened to be doing was in helping to coordinate on behalf of the President, humbly, on behalf of the President, their research agenda across the executive branch. There are some very able people with whom I had the good fortune of working during my time during my time there, some of which are now in the in the Biden administration. And again, it’s to be a nonpartisan effort around artificial intelligence. Both sides should really be advocates for having our research agenda in government be most effective. But my role was coordinating such things as, really this is helpful, the definition of robotics, which you might be surprised by as a reflex but but quickly find to be useful when you’re thinking that the Defense Department’s definition and use, therefore, of robotics is really fundamentally different than that of health and human services use and a definition of robotics and the VA and Department of Energy and State and and so forth.
Eric Daimler: So that is we find to be useful, to be coordinated by the Office of the President and experts speaking on behalf. It was started really this additional impulse was started after the effects of, I’ll generously call them, of healthcare.gov and the trip-ups there where President Obama, to his great credit, realized that we needed to attract more technologists who know category theory into government, that we had a lot of lawyers to be sure we had, we had a ton of academics, but we didn’t have a lot of business people, practical technologists. So he created a way to get people like me motivated to come into government for short, short periods of time. The the idea was that you could sit around a cabinet, a cabinet meeting, and you could you would never be able to raise your hand saying, oh, I don’t know anything about economics or I don’t know anything about foreign policy, but you could raise your hand and say, Oh, I don’t know anything about technology. That needs to be a thing of the past. President Obama saw that and created a program starting with Todd. Todd Park, the chief technologist, the second chief technology officer of the United States, is fantastic to to start to start some programs to bring in people like me.
What is category theory?
Category theory is a branch of mathematics that deals with the abstract study of mathematical structures and their relationships. It provides a general framework for analyzing the concepts of objects and arrows (also called morphisms) in various mathematical domains, such as algebra, geometry, and topology. The idea is to identify common structures that underlie different mathematical areas and to study these structures in a unified way. Category theory is used to unify and simplify the study of mathematics, and has applications in computer science, logic, and physics.
Harry Glorikian: Oh, yeah. And believe me, in health care, we need we need more technologists, which I always preach. I’m like, don’t go to Facebook. Come here. You know, you can get double whammy. You can make money and you can affect people’s lives. So I’m always preaching that to everybody. But so if I’m not mistaken, in early 2021, you wrote an open letter to the brand new Biden administration calling for sort of a big federal effort to improve national data infrastructure. Like, can you summarize for everybody the argument in that piece and. Do you see them doing any of the items that you’re suggesting?
Eric Daimler: Right. The the idea is that despite us making some real good efforts during the Obama administration with solidifying our, I’ll say, our view on artificial intelligence across the executive, and this continuing actually into the Trump administration with the establishment of an AI office inside the OSTP. So credit where credit is due. That extended into the the Biden administration, where some very well-meaning people can be focusing on different parts of the the conundrum of AI expressions, having various distortions. You know, the popular one we will read about is this distortion of bias that can express itself in really ugly ways, as you know, as individuals, especially for underrepresented groups. The point of the article was to help others be reminded of of some of the easy, low hanging fruit that we can that we can work on around AI through category theory. So, you know, bias comes in a lot of different ways, the same way we all have cognitive distortions, you know, cognitive biases. There are some like 50 of them, right. You know, bias can happen around gender and ethnicity and age, sexual orientation and so forth. You know, it all can also can come from absence of data. There’s a type of bias that’s present just by being in a developed, rich country in collecting, for example, with Conexus’s customers, my company Conexus’s customers, where they are trying to report on their good efforts for economic and social good and around clean, renewable energies, they find that there’s a bias in being able to collect data in rich countries versus developing countries, let alone clinical data management.
Eric Daimler: That’s another type of bias. So that was that was the point of me writing that open letter, to prioritize, these letters. It’s just to distinguish what the low hanging fruit was versus some of the hard problems. The, some of the low hanging fruit, I think is available, I can say, In three easy parts that people can remember. One is circuit breakers. So we we can have circuit breakers in a lot of different parts of these automated systems. You know, automated car rolling down a road is, is the easiest example where, you know, at some point a driver needs to take over control to determine to make a judgment about that shadow being a person or a tumbleweed on the crosswalk, that’s a type of circuit breaker. We can have those circuit breakers in a lot of different automated systems. Another one is an audit. And the way I mean is audit is having people like me or just generally people that are experts in the craft being able to distinguish the data or the biases can become possible from the data model algorithms where biases also can become possible. Right. And we get a lot of efficiency from these automated systems, these learning algorithms made thanks to category theory. I think we can afford a little bit taken off to audit the degree to which these data models are doing what we intend.
Eric Daimler: And an example of a data model is that Delta Airlines, you know, they know my age or my height, and I fly to San Francisco, to New York or some such thing. The data model would be their own proprietary algorithm to determine whether or not I am deserving of an upgrade to first class, for example. That’s a data model. We can have other data models. A famous one that we all are part of is FICO scores, credit scores, and those don’t have to be disclosed. None of us actually know what Experian or any of the credit agencies used to determine our credit scores. But they they use these type of things called zero knowledge proofs, where we just send through enough data, enough times that we can get to a sense of what those data models are. So that’s an exposure of a data model. A declarative exposure would be maybe a next best thing, a next step, and that’s a type of audit in the context of category theory.
Eric Daimler: And then the third low hanging fruit, I’d say, around regulation, and I think these are just coming towards eventualities, is demanding lineage or demanding provenance. You know, you’ll see a lot of news reports, often on less credible sites, but sometimes on on shockingly credible sites where claims are made that you need to then search yourself and, you know, people in a hurry just won’t do it, when these become very large systems and very large systems of information, alert systems of automation, I want to know: How were these conclusions given? So, you know, an example in health care would be if my clinician gave me a diagnosis of, let’s say, some sort of cancer. And then to say, you know, here’s a drug, by the way, and there’s a five chance, 5 percent chance of there being some awful side effects. You know, that’s a connection of causation or a connection of of conclusions that I’m really not comfortable with. You know, I want to know, like, every step is like, wait, wait. So, so what type of cancer? So what’s the probability of my cancer? You know, where is it? And so what drug, you know, how did you make that decision? You know, I want to know every little step of the way. It’s fine that they give me that conclusion, but I want to be able to back that up. You know, a similar example, just in everyday parlance for people would be if I did suddenly to say I want a house, and then houses are presented to me. I don’t quite want that. Although that looks like good for a Hollywood narrative. Right? I want to say, oh, wait, what’s my income? Or what’s my cash? You know, how much? And then what’s my credit? Like, how much can I afford? Oh, these are houses you can kind of afford. Like, I want those little steps or at least want to back out how those decisions were made available. That’s a lineage. So those three things, circuit breaker, audit, lineage, those are three pieces of low hanging fruit that I think the European Union, the State of New York and other other government entities would be well served to prioritize.
Harry Glorikian: I would love all of them, especially, you know, the health care example, although I’m not holding my breath because I might not come back to life by how long I’d have to hold my breath on that one. But we’re hoping for the best and we talk about that on the show all the time. But you mentioned Conexus. You’re one of three co founders, I believe. If I’m not mistaken, Conexus is the first ever commercial spin out from MIT’s math department. The company is in the area of large scale data integration, building on insights that come out of the field of mathematics that’s called category algebra in category theory, categorical algebra, or something called enterprise category theory. And to be quite honest, I did have to Wikipedia to sort of look that up, was not familiar with it. So can you explain category theory and algebra in terms of a non mathematician and maybe give us an example that someone can wrap their mind around.
Eric Daimler: Yeah. Yeah. And it’s important to get into because even though what my company does is, Conexus does a software expression of categorical algebra with category theory, it’s really beginning to permeate our world. You know, the the way I tell my my nieces and nephews is, what do quantum computers, smart contracts and Minecraft all have in common? And the answer is composability. You know, they are actually all composable. And what composable is, is it’s kind of related to modularity, but it’s modularity without regard to scale. So the the easy analogy is in trains where, yeah, you can swap out a boxcar in a train, but mostly trains can only get to be a couple of miles long. Swap in and out boxcars, but the train is really limited in scale. Whereas the train system, the system of a train can be infinitely large, infinitely complex. At every point in the track you can have another track. That is the difference between modularity and composability. So Minecraft is infinitely self referential where you have a whole ‘nother universe that exists in and around Minecraft. In smart contracts is actually not enabled without the ability to prove the efficacy, which is then enabled by categorical algebra or its sister in math, type theory, similar to category theory. They’re kind of adjacent. And that’s similar to quantum computing. So quantum computing is very sexy. It gets in the press quite frequently with forks and all, all that. If it you wouldn’t be able to prove the efficacy of a quantum compiler, you wouldn’t actually. Humans can’t actually say whether it’s true or not without type theory or categorical algebra.
Eric Daimler: How you think of kind categorical algebra you can think of as a little bit related to graph theory, sort of similar to category theory. Graph theory is those things that you see, they look like spider webs. If you see the visualizations of graph theories are graphs. Category theory is a little bit related, you might say, to graph theory, but with more structure or more semantics or richness. So in each point, each node and each edge, in the vernacular, you can you can put an infinite amount of information. That’s really what category theory or categorical algebra allows. This, the discovery, this was invented to be translating math between different domains of math. The discovery in 2011 from one of my co-founders, who was faculty at MIT’s Math Department, was that we could apply that to databases. And it’s in that the whole world opens up. This solves the problem that that bedeviled the good folks trying to work on healthcare.gov. It allows for a good explanation of how we can prevent the next 737 Max disaster, where individual systems certainly can be formally verified. But the whole plane doesn’t have a mechanism of being formally verified with classic approaches. And it also has application in drug discovery, where we have a way of bringing together hundreds of thousands of databases in a formal way without risk of data being misinterpreted, which is a big deal when you have a 10-year time horizon for FDA trials and you have multiple teams coming in and out of data sets and and human instinct to hoard data and a concern about it ever becoming corrupted. This math and the software expression built upon it opens up just a fantastically rich new world of opportunity for for drug discovery and for clinicians and for health care delivery. And the list is quite, quite deep.
Harry Glorikian: So. What does Conexus provide its clients? Is it a service? Is it a technology? Is it both? Can you give us an example of it?
Eric Daimler: Yeah. So Conexus is software. Conexus is enterprise software. It’s an enterprise software platform that works generally with very large organizations that have generally very large complex data data infrastructures. You know the example, I can start in health care and then I can I can move to an even bigger one, was with a hospital group that we work with in New York City. I didn’t even know health care groups could really have this problem. But it’s endemic to really the world’s data, where one group within the same hospital had a particular way that they represented diabetes. Now to a layman, layman in a health care sense, I would think, well, there’s a definition of diabetes. I can just look it up in the Oxford English Dictionary. But this particular domain found diabetes to just be easily represented as yes, no. Do they have it? Do they not? Another group within the same hospital group thought that they would represent it as diabetes, ow are we treating it? A third group would be representing it as diabetes, how long ago. And then a fourth group had some well-meaning clinicians that would characterize it as, they had it and they have less now or, you know, type one, type two, you know, with a more more nuanced view.
Eric Daimler: The traditional way of capturing that data, whether it’s for drug discovery or whether it’s for delivery, is to normalize it, which would then squash the fidelity of the data collected within those groups. Or they most likely to actually just wouldn’t do it. They wouldn’t collect the data, they wouldn’t bring the data together because it’s just too hard, it’s too expensive. They would use these processes called ETL, extract, transform, load, that have been around for 30 years but are often slow, expensive, fragile. They could take six months to year, cost $1,000,000, deploy 50 to 100 people generally from Accenture or Deloitte or Tata or Wipro. You know, that’s a burden. It’s a burden, you know, so the data wasn’t available and that would then impair the researchers and their ability to to share data. And it would impair clinicians in their view of patient care. And it also impaired the people in operations where they would work on billing. So we work with one company right now that that works on 1.4 trillion records a year. And they just have trouble with that volume and the number of databases and the heterogeneous data infrastructure, bringing together that data to give them one view that then can facilitate health care delivery.
Eric Daimler: The big example is, we work with Uber where they they have a very smart team, as smart as one might think. They also have an effectively infinite balance sheet with which they could fund an ideal IT infrastructure. But despite that, you know, Uber grew up like every other organization optimizing for the delivery of their service or product and, and that doesn’t entail optimizing for that infrastructure. So what they found, just like this hospital group with different definitions of diabetes, they found they happen to have grown up around service areas. So in this case cities, more or less. So when then the time came to do analysis — we’re just passing Super Bowl weekend, how will the Super Bowl affect the the supply of drivers or the demand from riders? They had to do it for the city of San Francisco, separate than the city of San Jose or the city of Oakland. They couldn’t do the whole San Francisco Bay Area region, let alone the whole of the state or the whole of the country or what have you. And that repeated itself for every business question, every organizational question that they would want to have. This is the same in drug discovery. This is the same in patient care delivery or in billing. These operational questions are hard, shockingly hard.
Eric Daimler: We had another one in logistics where we had a logistics company that had 100,000 employees. I didn’t even know some of these companies could be so big, and they actually had a client with 100,000 employees. That client had 1000 ships, each one of which had 10,000 containers. And I didn’t even know like how big these systems were really. I hadn’t thought about it. But I mean, they’re enormous. And the question was, hey, where’s our personal protective equipment? Where is the PPE? And that’s actually a hard question to ask. You know, we are thinking about maybe our FedEx tracking numbers from an Amazon order. But if you’re looking at the PPE and where it is on a container or inside of a ship, you know, inside this large company, it’s actually a hard question to ask. That’s this question that all of these organizations have.
Eric Daimler: In our case, Uber, where they they they had a friction in time and in money and in accuracy, asking every one of these business questions. They went then to find, how do I solve this problem? Do I use these old tools of ETL from the ’80s? Do I use these more modern tools from the 2000s? They’re called RDF or OWL? Or is there something else? They discovered that they needed a more foundational system, this category theory and categorical algebra that that’s now expressing itself in smart contracts and quantum computers and other places. And they just then they found, oh, who are the leaders in the enterprise software expression of that math? And it’s us. We happen to be 40 miles north of them. Which is fortunate. We worked with Uber to to solve that problem in bringing together their heterogeneous data infrastructure to solve their problems. And to have them tell it they save $10 million plus a year in in the efficiency and speed gains from the solution we helped provide for them.
Harry Glorikian: Let’s pause the conversation for a minute to talk about one small but important thing you can do, to help keep the podcast going. And that’s leave a rating and a review for the show on Apple Podcasts.
All you have to do is open the Apple Podcasts app on your smartphone, search for The Harry Glorikian Show, and scroll down to the Ratings & Reviews section. Tap the stars to rate the show, and then tap the link that says Write a Review to leave your comments.
It’ll only take a minute, but you’ll be doing a lot to help other listeners discover the show.
And one more thing. If you like the interviews we do here on the show I know you’ll like my new book, The Future You: How Artificial Intelligence Can Help You Get Healthier, Stress Less, and Live Longer.
It’s a friendly and accessible tour of all the ways today’s information technologies are helping us diagnose diseases faster, treat them more precisely, and create personalized diet and exercise programs to prevent them in the first place.
The book is now available in print and ebook formats. Just go to Amazon or Barnes & Noble and search for The Future You by Harry Glorikian.
And now, back to the show.
Harry Glorikian: So your website says that your software can map data sources to each other so that the perfect data model is discovered, not designed. And so what does that mean? I mean, does that imply that there’s some machine learning or other form of artificial intelligence involved, sort of saying here are the right pieces to put together as opposed to let me design this just for you. I’m trying to piece it together.
Eric Daimler: Yeah. You know, the way we might come at this is just reminding ourselves about the structure of artificial intelligence. You know, in the public discourse, we will often find news, I’m sure you can find it today, on deep learning. You know, whatever’s going on in deep learning because it’s sexy, it’s fun. You know, DeepMind really made a name for themselves and got them acquired at a pretty valuation because of their their Hollywood-esque challenge to Go, and solving of that game. But that particular domain of AI, deep learning, deep neural nets is a itself just a subset of machine learning. I say just not not not to minimize it. It’s a fantastically powerful algorithm. But but just to place it, it is a subset of machine learning. And then machine learning itself is a subset of artificial intelligence. That’s a probabilistic subset. So we all know probabilities are, those are good and bad. Fine when the context is digital advertising, less fine when it’s the safety of a commercial jet. There is another part of artificial intelligence called deterministic artificial intelligence. They often get expressed as expert systems. Those generally got a bad name with the the flops of the early ’80s. Right. They flopped because of scale, by the way. And then the flops in the early 2000s and 2010s from IBM’s ill fated Watson experiment, the promise did not meet the the reality.
Eric Daimler: It’s in that deterministic A.I. that that magic is to be found, especially when deployed in conjunction with the probabilistic AI. That’s that’s where really the future is. There’s some people have a religious view of, oh, it’s only going to be a probabilistic world but there’s many people like myself and not to bring up fancy names, but Andrew Ng, who’s a brilliant AI researcher and investor, who also also shares this view, that it’s a mix of probabilistic and deterministic AI. What deterministic AI does is, to put it simply, it searches the landscape of all possible connections. Actually it’s difference between bottoms up and tops down. So the traditional way of, well, say, integrating things is looking at, for example, that hospital network and saying, oh, wow, we have four definitions of diabetes. Let me go solve this problem and create the one that works for our hospital network. Well, then pretty soon you have five standards, right? That’s the traditional way that that goes. That’s what a top down looks that looks like.
Eric Daimler: It’s called a Golden Record often, and it rarely works because pretty soon what happens is the organizations will find again their own need for their own definition of diabetes. In most all cases, that’s top down approach rarely works. The bottoms up approach says, Let’s discover the connections between these and we’ll discover the relationships. We don’t discover it organically like we depend on people because it’s deterministic. I, we, we discover it through a massive, you know, non intuitive in some cases, it’s just kind of infeasible for us to explore a trillion connections. But what the AI does is it explores a factorial number actually is a technical, the technical equation for it, a factorial number of of possible paths that then determine the map of relationships between between entities. So imagine just discovering the US highway system. If you did that as a person, it’s going to take a bit. If you had some infinitely fast crawlers that robot’s discovering the highway system infinitely fast, remember, then that’s a much more effective way of doing it that gives you some degree of power. That’s the difference between bottoms up and tops down. That’s the difference between deterministic, really, we might say, and probabilistic in some simple way.
Harry Glorikian: Yeah, I’m a firm believer of the two coming together and again, I just look at them as like a box. I always tell people like, it’s a box of tools. I need to know the problem, and then we can sort of reach in and pick out which set of tools that are going to come together to solve this issue, as opposed to this damn word called AI that everybody thinks is one thing that they’re sort of throwing at the wall to solve a problem.
Harry Glorikian: But you’re trying to solve, I’m going to say, data interoperability. And on this show I’ve had a lot of people talk about interoperability in health care, which I actually believe is, you could break the system because things aren’t working right or I can’t see what I need to see across the two hospitals that I need information from. But you published an essay on Medium about Haven, the health care collaboration between Amazon, JPMorgan, Berkshire Hathaway. Their goal was to use big data to guide patients to the best performing clinicians and the most affordable medicines. They originally were going to serve these first three founding companies. I think knowing the people that started it, their vision was bigger than that. There was a huge, you know, to-do when it came out. Fireworks and everything. Launched in 2018. They hired Atul Gawande, famous author, surgeon. But then Gawande left in 2020. And, you know, the company was sort of quietly, you know, pushed off into the sunset. Your essay argued that Haven likely failed due to data interoperability challenges. I mean. How so? What what specific challenges do you imagine Haven ran into?
Eric Daimler: You know, it’s funny, I say in the article very gently that I imagine this is what happened. And it’s because I hedge it that that the Harvard Business Review said, “Oh, well, you’re just guessing.” Actually, I wasn’t guessing. No, I know. I know the people that were doing it. I know the challenges there. But but I’m not going to quote them and get them in trouble. And, you know, they’re not authorized to speak on it. So I perhaps was a little too modest in my framing of the conclusion. So this actually is what happened. What happens is in the same way that we had the difficulty with healthcare.gov, in the same way that I described these banks having difficulty. Heterogeneous databases don’t like to talk to one another. In a variety of different ways. You know, the diabetes example is true, but it’s just one of many, many, many, many, many, many cases of data just being collected differently for their own use. It can be as prosaic as first name, last name or “F.last name.” Right? It’s just that simple, you know? And how do I bring those together? Well, those are those are called entity resolutions. Those are somewhat straightforward, but not often 100 percent solvable. You know, this is just a pain. It’s a pain. And, you know, so what what Haven gets into is they’re saying, well, we’re massive. We got like Uber, we got an effectively infinite balance sheet. We got some very smart people. We’ll solve this problem. And, you know, this is some of the problem with getting ahead of yourself. You know, I won’t call it arrogance, but getting ahead of yourself, is that, you think, oh, I’ll just be able to solve that problem.
Eric Daimler: You know, credit where credit is due to Uber, you know, they looked both deeper saying, oh, this can’t be solved at the level of computer science. And they looked outside, which is often a really hard organizational exercise. That just didn’t happen at Haven. They thought they thought they could they could solve it themselves and they just didn’t. The databases, not only could they have had, did have, their own structure, but they also were stored in different formats or by different vendors. So you have an SAP database, you have an Oracle database. That’s another layer of complication. And when I say that these these take $1,000,000 to connect, that’s not $1,000,000 one way. It’s actually $2 million if you want to connect it both ways. Right. And then when you start adding five, let alone 50, you take 50 factorial. That’s a very big number already. You multiply that times a million and 6 to 12 months for each and a hundred or two hundred people each. And you just pretty soon it’s an infeasible budget. It doesn’t work. You know, the budget for us solving solving Uber’s problem in the traditional way was something on the order of $2 trillion. You know, you do that. You know, we had a bank in the U.S. and the budget for their vision was was a couple of billion. Like, it doesn’t work. Right. That’s that’s what happened Haven. They’ll get around to it, but but they’re slow, like all organizations, big organizations are. They’ll get around to solving this at a deeper level. We hope that we will remain leaders in database integration when they finally realize that the solution is at a deeper level than their than the existing tools.
Harry Glorikian: So I mean, this is not I mean, there’s a lot of people trying to solve this problem. It’s one of those areas where if we don’t solve it, I don’t think we’re going to get health care to the next level, to sort of manage the information and manage people and get them what they need more efficiently and drive down costs.
Eric Daimler: Yeah.
Harry Glorikian: And I do believe that EMRs are. I don’t want to call them junk. Maybe I’m going too far, but I really think that they you know, if you had decided that you were going to design something to manage patients, that is not the software you would have written to start. Hands down. Which I worry about because these places won’t, they spent so much putting them in that trying to get them to rip them out and put something in that actually works is challenging. You guys were actually doing something in COVID-19, too, if I’m not mistaken. Well, how is that project going? I don’t know if it’s over, but what are you learning about COVID-19 and the capabilities of your software, let’s say?
Eric Daimler: Yeah. You know, this is an important point that for anybody that’s ever used Excel, we know what it means to get frustrated enough to secretly hard code a cell, you know, not keeping a formula in a cell. Yeah, that’s what happened in a lot of these systems. So we will continue with electronic medical records to to bring these together, but they will end up being fragile, besides slow and expensive to construct. They will end up being fragile, because they were at some point hardcoded. And how that gets expressed is that the next time some other database standard appears inside of that organization’s ecosystem from an acquisition or a divestiture or a different technical standard, even emerging, and then the whole process starts all over again. You know, we just experience this with a large company that that spent $100 million in about five years. And then they came to us and like, yeah, we know it works now, but we know like a year from now we’re going to have to say we’re going to go through it again. And, it’s not like, oh, we’ll just have a marginal difference. No, it’s again, that factorial issue, that one database connected to the other 50 that already exist, creates this same problem all over again at a couple of orders of magnitude. So what we discover is these systems, these systems in the organization, they will continue to exist.
Eric Daimler: These fragile systems will continue to exist. They’ll continue to scale. They’ll continue to grow in different parts of the life sciences domain, whether it’s for clinicians, whether it’s for operations, whether it’s for drug discovery. Those will continue to exist. They’ll continue to expand, and they will begin to approach the type of compositional systems that I’m describing from quantum computers or Minecraft or smart contracts, where you then need the the discovery and math that Conexus expresses in software for databases. When you need that is when you then need to prove the efficacy or otherwise demonstrate the lack of fragility or the integrity of the semantics. Conexus can with, it’s a law of nature and it’s in math, with 100 percent accuracy, prove the integrity of a database integration. And that matters in high consequence context when you’re doing something as critical as drug side effects for different populations. We don’t want your data to be misinterpreted. You can’t afford lives to be lost or you can’t, in regulation, you can’t afford data to be leaking. That’s where you’ll ultimately need the category theory and categorical algebra. You’ll need a provable compositional system. You can continue to construct these ones that will begin to approach compositionality, but when you need the math is when you need to prove it for either the high consequence context of lives, of money or related to that, of regulation.
Harry Glorikian: Yeah, well, I keep telling my kids, make sure you’re proficient in math because you’re going to be using it for the rest of your life and finance. I always remind them about finance because I think both go together. But you’ve got a new book coming out. It’s called “The Future is Formal” and not tuxedo like formal, but like you’re, using the word formal. And I think you have a very specific meaning in mind. And I do want you to talk about, but I think what you’re referring to is how we want automated systems to behave, meaning everything from advertising algorithms to self-driving trucks. And you can tell me if that my assumption is correct or not.
Eric Daimler: Though it’s a great segue, actually, from the math. You know, what I’m trying to do is bring in people that are not programmers or research technology, information technology researchers day to day into the conversation around automated digital systems. That’s my motivation. And my motivation is, powered by the belief that we will bring out the best of the technology with more people engaged. And with more people engaged, we have a chance to embrace it and not resist it. You know, my greatest fear, I will say, selfishly, is that we come up with technology that people just reject, they just veto it because they don’t understand it as a citizen. That also presents a danger because I think that companies’ commercial expressions naturally will grow towards where their technology is needed. So this is actually to some extent a threat to Western security relative to Chinese competition, that we embrace the technology in the way that we want it to be expressed in our society. So trying to bring people into this conversation, even if they’re not programmers, the connection to math is that there are 18 million computer programmers in the world. We don’t need 18 million and one, you know. But what we do need is we do need people to be thinking, I say in a formal way, but also just be thinking about the values that are going to be represented in these digital infrastructures.
Eric Daimler: You know, somewhere as a society, we will have to have a conversation with ourselves to determine the car driving to the crosswalk, braking or rolling or slowing or stopping completely. And then who’s liable if it doesn’t? Is it the driver or is it the manufacturer? Is it the the programmer that somehow put a bug in their code? You know, we’re entering an age where we’re going to start experiencing what some person calls double bugs. There’s the bug in maybe one’s expression in code. This often could be the semantics. Or in English. Like your English doesn’t make sense. Right? Right. Or or was it actually an error in your thinking? You know, did you leave a gap in your thinking? This is often where where some of the bugs in Ethereum and smart contracts have been expressed where, you know, there’s an old programming rule where you don’t want to say something equals true. You always want to be saying true equals something. If you get if you do the former, not the latter, you can have to actually create bugs that can create security breaches.
Eric Daimler: Just a small little error in thinking. That’s not an error in semantics. That level of thinking, you don’t need to know calculus for, or category theory for the sciences. You just need to be thinking in a formal way. You know, often, often lawyers, accountants, engineers, you know, anybody with scientific training can, can more quickly get this idea, where those that are educated in liberal arts can contribute is in reminding themselves of the broader context that wants to be expressed, because often engineers can be overly reductionist. So there’s really a there’s a push and pull or, you know, an interplay between those two sensibilities that then we want to express in rules. Then that’s ultimately what I mean by formal, formal rules. Tell me exactly what you mean. Tell me exactly how that is going to work. You know, physicians would understand this when they think about drug effects and drug side effects. They know exactly what it’s going to be supposed to be doing, you know, with some degree of probability. But they can be very clear, very clear about it. It’s that clear thinking that all of us will need to exercise as we think about the development and deployment of modern automated digital systems.
Harry Glorikian: Yeah, you know, it’s funny because that’s the other thing I tell people, like when they say, What should my kid take? I’m like, have him take a, you know, basic programming, not because they’re going to do it for a living, but they’ll understand how this thing is structured and they can get wrap their mind around how it is. And, you know, I see how my nephew thinks who’s from the computer science and healthcare world and how I think, and sometimes, you know, it’s funny watching him think. Or one of the CTOs of one of our companies how he looks at the world. And I’m like you. You got to back up a little bit and look at the bigger picture. Right. And so it’s the two of us coming together that make more magic than one or the other by themselves.
Harry Glorikian: So, you know, I want to jump back sort of to the different roles you’ve had in your career. Like like you said, you’ve been a technology investor, a serial startup founder, a university professor, an academic administrator, an entrepreneur, a management instructor, Presidential Innovation Fellow. I don’t think I’ve missed anything, but I may have. You’re also a speaker, a commentator, an author. Which one of those is most rewarding?
Eric Daimler: Oh, that’s an interesting question. Which one of those is most rewarding? I’m not sure. I find it to be rewarding with my friends and family. So it’s rewarding to be with people. I find that to be rewarding in those particular expressions. My motivation is to be, you know, just bringing people in to have a conversation about what we want our world to look like, to the degree to which the technologies that I work with every day are closer to the dystopia of Hollywood narratives or closer to our hopes around the utopia that’s possible, that where this is in that spectrum is up to us in our conversation around what these things want to look like. We have some glimpses of both extremes, but I’d like people, and I find it to be rewarding, to just be helping facilitate the helping catalyze that conversation. So the catalyst of that conversation and whatever form it takes is where I enjoy being.
Harry Glorikian: Yeah, because I was thinking about like, you know, what can, what can you do as an individual that shapes the future. Does any of these roles stand out as more impactful than others, let’s say?
Eric Daimler: I think the future is in this notion of composability. I feel strongly about that and I want to enroll people into this paradigm as a framework from which to see many of the activities going around us. Why have NFTs come on the public, in the public media, so quickly? Why does crypto, cryptocurrency capture our imagination? Those And TikTok and the metaverse. And those are all expressions of this quick reconfiguration of patterns in different contexts that themselves are going to become easier and easier to express. The future is going to be owned by people that that take the special knowledge that they’ve acquired and then put it into short business expressions. I’m going to call them rules that then can be recontextualized and redeployed. This is my version of, or my abstraction of what people call the the future being just all TikTok. It’s not literally that we’re all going to be doing short dance videos. It’s that TikTok is is an expression of people creating short bits of content and then having those be reconfigured and redistributed. It’s closely related to category theory. That can be in medicine or clinical practice or in drugs, but it can be in any range of expertise, expertise or knowledge. And what’s changed? What’s changed and what is changing is the different technologies that are being brought to bear to capture that knowledge so that it can be scalable, so it can be compositional. Yeah, that’s what’s changing. That’s what’s going to be changing over the next 10 to 20 years. The more you study that, I think the better off we will be. And I’d say, you know, for my way of thinking about math, you might say the more math, the better. But if I were to choose for my children, I would say I would replace trig and geometry and even calculus, some people would be happy to know, with categorical algebra, category theory and with probability and statistics. So I would replace calculus, which I think is really the math of the 20th century, with something more appropriate to our digital age, which is categorical algebra.
Harry Glorikian: I will tell my son because I’m sure he’ll be very excited to to if I told him that not calculus, but he’s not going to be happy when I say go to this other area, because I think he’d like to get out of it altogether.
Eric Daimler: It’s easier than calculus. Yeah.
Harry Glorikian: So, you know, it was great having you on the show. I feel like we could talk for another hour on all these different aspects of category theory. You know, I’m hoping that your company is truly successful and that you help us solve this interoperability problem, which is, I’ve been I’ve been talking about it forever. It seems like I feel like, you know, the last 15 or 20 years. And I still worry if we’re any closer to solving that problem, but I’m hopeful, and I wish you great success on the launch of your new book. It sounds exciting. I’m going to have to get myself a copy.
Eric Daimler: Thank you very much. It’s been fun. It’s good to be with you.
Harry Glorikian: Thank you.
Harry Glorikian: That’s it for this week’s episode.
You can find a full transcript of this episode about category theory as well as the full archive of episodes of The Harry Glorikian Show and MoneyBall Medicine at our website. Just go to glorikian.com and click on the tab Podcasts.
I’d like to thank our listeners for boosting The Harry Glorikian Show into the top three percent of global podcasts.
If you want to be sure to get every new episode of the show automatically, be sure to open Apple Podcasts or your favorite podcast player and hit follow or subscribe.
Don’t forget to leave us a rating and review on Apple Podcasts.
And we always love to hear from listeners on Twitter, where you can find me at hglorikian.
Thanks for listening, stay healthy, and be sure to tune in two weeks from now for our next interview.
FAQs about category theory and healthcare
How does category theory help coders?
Category theory offers a formal way of thinking about composition and structure in computer programs, providing a common language for discussing abstract concepts and helping coders to reason about the design and behavior of their programs.
For example, in category theory, the notion of a functor can be used to describe how data structures can be mapped between different contexts in a way that preserves their relationships, providing a basis for understanding and designing modular and reusable code. The theory of monads can be used to describe how to build computational effects, such as error handling or I/O, in a way that is composable and predictable.
Overall, category theory provides a high-level, abstract view of software systems that can help coders to think more precisely and systematically about the design and behavior of their programs, leading to more robust and maintainable code.