Debunking large language models in healthcare with Isaac Kohane

EPISODE HIGHLIGHTS
Evaluating large language models in healthcare is complex, and a human-in-the-loop approach, with doctors working alongside the models, is currently necessary.
The emergence of powerful AI language models raises questions about human cognition, the structure of language, and the ways we prompt and interact with each other.
The medical establishment is slow to adopt new technologies unless they increase revenues, but the use of electronic health records has been incentivized due to billing efficiency.
The use of large language models in healthcare may become a medical legal liability, as failure to utilize the technology could be seen as negligence in the future.

Harry Glorikian: Hello. I’m Harry Glorikian, and this is The Harry Glorikian Show, where we explore how technology is changing everything we know about healthcare.

My guest today is Dr. Isaac Kohane.

He’s the chair of the Department of Biomedical Informatics at Harvard Medical School.

And he’s the co-author of a new book called The AI Revolution in Medicine: GPT-4 and Beyond.

You’re no doubt aware that chatbots built on GPT-4 and other large language models in healthcare are making a big splash in pop culture and starting to change industries like search, advertising, and customer service.

And here on the show we’ve talked recently about how specialized versions of large language models in healthcare might someday become important reference tools for doctors and other professionals.

But what Dr. Kohane and his colleagues argue in the book is that this future isn’t theoretical.

It’s here, right now.

He says doctors are already using GPT-4 and other large language models in healthcare in their daily practice, whether or not the hospitals and clinics where they work know about it.

On the one hand that’s understandable, since no physician likes to make a mistake or miss a diagnosis, and even general large language models in healthcare like GPT-4 can generate high-quality medical advice.

Large language models in healthcare are also really good at automatically generating different kinds of text, which can be a big help in an era where most doctors are feeling completely crushed under the weight of all the paperwork they have to do every day.

But on the other hand, this is all a little scary, since there are no real guidelines yet for how large language models in healthcare should be deployed in medical settings, how to guard against the new kinds of errors that AI can introduce, or how to use the technology without compromising patient privacy.

We’re just at the beginning of the conversation about how to manage those challenges — and how to use the latest generation of AI tools to make healthcare delivery more efficient, without endangering patients along the way.

That’s what Dr. Kohane’s book is all about, and that’s why I wanted to have him on the show.

So here’s our conversation.

Harry Glorikian: Dr. Kohane, welcome to the show.

Isaac Kohane: Glad to be here.

Harry Glorikian: So it was great to see you the other day at one of the local events here in Cambridge. But, you know, just for everybody who’s listening to the show, I wondered if you could tell me a little bit about yourself, your position at Harvard? You’re the inaugural chair of the Department of Biomedical Informatics. You know, what’s the focus of that department? Et cetera. If you could give everybody an overview.

Isaac Kohane: Sure. So in some sense, the department is the culmination of a lifelong dream. I had to create a community that I would have wanted to have. 30 years ago, I arrived fresh from Geneva, Switzerland, went to college here and learned a lot about computer science. And even though I was a biology major, I really enjoyed learning about computer science. When I was in medical school, I realized that I really wanted to pursue studies in computer science. And so all my work to go pursue a PhD in computer science—this was during another heyday of artificial intelligence in the 1980s, where there was also a lot of hype and hope around it and in retrospect, far less well grounded than we currently observe. And after, when I finished my training, both in medicine and in computer science, I went on to start a lab at Children’s Hospital. And as these things go, success breeds recognition. And so gradually the medical school asked me to establish a center of biomedical informatics and then a Department of Biomedical Informatics. What is the department? The department really is about the intersection of biomedical research and computing. And it’s a fairly broad agenda all the way from genomics—can we use computational techniques to find better cancer drugs or drugs to treat dementia—all the way to clinical decision support. And we’re fortunate to have faculty that I would say the majority of them have PhDs and a minority of us have MDs or like me, MD PhDs. And but we’re all broadly in support of this mission of advancing biomedicine and biomedical research through the application of computing and quantitative reasoning and quantitative methods.

Harry Glorikian: Yeah. You know, I’m wondering, you know, because you’ve been in this for a long time, did you ever think we’d be where we are now? I mean, where, in the form of these large language models in medicine, like we’ve been handed a tool so powerful that we almost have to slow down to figure out how not to use it. So I’m wondering, you know, if you think about where we are now versus where we were actually not that long ago.

Isaac Kohane: Yeah, that’s right. So I think I am in good company when I would say that going back to a really wonderful movie based on the science fiction author’s imagination, Arthur C Clarke in collaboration with Stanley Kubrick. There was a movie called 2001: A Space Odyssey, and in this movie, two important things happened. One is humanity’s entry into space. And that was an exciting time because we had this was the time of the moonshot and very active Apollo program, and computers were being developed. But many of us actually thought that by, let’s say, 2020, we would already be a multi-planetary species. And we thought the HAL computer was unrealistic. And in fact, what has happened is that essentially a computer of at least HAL’s linguistic expertise seems to have been realized. And we’ve been a little bit slow until very recently in achieving some of the space exploration ambitions. And so when Peter Lee, who’s the senior vice president, the senior vice corporate vice president at Microsoft and also the head of Microsoft Research, reached out to me in October, I believe, of 2022 to share with me then highly confidential debrief on what was eventually going to be called GPT-4, but then had the code name DaVinci-3, when he first talked, told me about it, even though I have tremendous respect for Peter, he himself not only has this industrial achievements with Microsoft but he also was a program manager at DARPA. And before that, a computer scientist and department chair at Carnegie Mellon University. And so but nonetheless, when he first told me what it was doing, I was a little bit skeptical. But then he started showing me what it could do. And I was truly dumbfounded. And I had to stop him. And I said, you know. And in the abstract, I’m not completely surprised by what you’re showing me, but I thought this was going to be maybe 5 to 15 years away from now. I just did not expect this to happen now. Yeah, I mean, so it’s been…

Harry Glorikian: So many people. I’ve tried to explain it. And then once you show them, they’re like, oh, like, yeah, it’s this moment where, you know, seeing is believing.

Isaac Kohane: And I don’t know of a single serious computer scientist who can fully explain in an intuitive manner why a program that can predict the likelihood of the next word that should follow a sequence of words, how that process seems to be capable of very elaborate conversations, theory of mind, causal reasoning, the ability to help you to write programs. We all can talk about the reasons why it might work and how it might be doing it. But then when you get through, it’s very bizarre because that’s something we don’t think that’s the way we think. When I’m thinking about what I’m going to say to you, yes, maybe one word follows the other. But I’m also thinking about how I am going to package this? There’s no thinking about how to package it. It’s just one word after the other. In fact, what’s funny is that if you make it think, in the sense of you make it reflect on what it just said, sometimes that improves its performance.

Harry Glorikian: Yep, yep.

Isaac Kohane: And so but but nonetheless, it’s bizarre. I think just the science of understanding why this is will actually give deep insights not only into computing, but the structure of human language and the cognitive processes behind language. And, you know, in fact, when Peter Lee told me about this, I said to him something that most computer scientists are familiar with. I use the following metaphors that are due to Daniel Kahneman, the Nobel Prize winner who developed with Tversky a lot of our knowledge about human cognition and human errors. But he wrote a book after, long after the death of his partner in science called Thinking Fast and Slow. And there he posits two different types of thinking, fast and slow. And the fast thinking is the very fast, responsive, very evolutionarily driven kind of thinking. So that when we see out of a corner of the eye, something heading towards us, we start our fight flight or fight response. And so it’s more of a reflexive intelligence. And then there’s the slow part, which is when we’re aware of what we’re thinking, where we are when we do logical inference, but it’s much slower.

Isaac Kohane: And so we know that songbirds, the language of songbirds, can actually be recapitulated by a Markov model. A Markov model is a simple model that just predicts probabilistically going from one state to the other. And it turns out you can recapitulate and have good predictive model of songbird language just with a Markov model. It doesn’t work at all for humans, but here we have a far more complicated, a far more elaborate, a far higher parameterized model of human verbiage. It’s an interesting question. To what degree is our language “just say the next word.” And it’s, we’re in a very interesting world. We should return to medicine shortly because I am not by any stretch a cognitive scientist, but it really poses the question, well, it’s posing a question to a lot of us, which is: How important is how we prompt each other in our interactions?

Isaac Kohane: And I think that it’s interesting, how sometimes accurately or sometimes metaphorically different breakthroughs have effects on societies. So, for example, although it was incredibly flawed, the Freudian revolution made us start thinking about our subconscious or non-conscious thoughts. The theory of general relativity, which really doesn’t speak to culture at all, you know, gave us ideas about relativism and how observer dependence can affect many things. And what I’m seeing in my own interactions and in the broader society is that the fact that this activity that we’ve so long associated with higher functioning that is unique to humans is creating all sorts of metaphors. Like what I just said, you know, how do I prompt you, Harry, effectively to respond the way I want. And you know, it starts it starts people thinking more about how does marketing work, how do how does political argumentation work? And there may be processes that we’ve been dimly aware of that in fact, may succumb to that kind of framework. Now, again, I think it’s stretching the metaphor, but I think it’s important to recognize this metaphor is going to be growing within our culture because we just didn’t expect to have programs that could converse with apparent empathy, with a high level of linguistic virtuosity. But that brings us to medicine. And all of a sudden, patients have access to a very flawed but very powerful assistant that knows a heck of a lot of about medicine, knows more than any individual doctor and even in very narrow areas of medicine, can have expert performance, but also can go off the rails in unpredictable and bizarre and convincing, unfortunately, ways.

Harry Glorikian: Yeah, well, this brings me to, you know, you got together with, or Peter Lee approached you and I believe it was a there was someone else, Carey Goldberg. And you got together and wrote this book called The AI Revolution in Medicine: GPT-4 and Beyond. And that came out, I think, in mid-April of this year. Um, you know, the three of you got together and wrote this book, and I know that the book focuses on GPT-4, which is a general large language model, but it’s just one of a number of growing models that we’re seeing today. I mean, we’re already seeing the emergence of these AI language models in healthcare for specialized fields, including medicine. I recently interviewed the Google team on MedPaLM-2. How much of the book is so specific to just GPT-4? Is there anything you wrote that necessarily wouldn’t apply to, say, Google’s Bard or Palm-2 or Anthropic’s Claude or Facebook’s LLaMA program? I mean, I think most of what you wrote, I almost want to say, could be applied to all of these, even though GPT-4 was the canvas you were using.

Isaac Kohane: Yes. So GPT was the canvas. And not only that, but it was an interlocutor. We could ask it things and really check out how at least one of these models would work. But I think that these all these models are incredibly impressive and they have different strengths. And furthermore, they change almost day to day in their performance. And in the book I wrote a chapter where I was trying to think out loud about how do you evaluate and evaluate these? And it was really not meant to be specific to GPT-4. I said because I talked about the trial, the trainee and the torchbearer. And the trial is a slam dunk evaluation process for the kind of narrow AI programs that were already incredibly surprising, that were mostly in the in the visual appreciation fields like radiology and pathology and dermatology that really had breakthrough performance. But it was a very narrow set of capabilities. And you could run a trial that that doctors could understand, that FDA could understand where you say, here’s the goal, here’s the inclusion criteria, Here are the exclusion criteria. And then you’d know, did we succeed or not in showing that this program is superior to humans or superior to another program and so on? There’s a whole bunch of really good methodology for that. And then I said the next, but does that work for large language models in healthcare, whether GPT or not? And the problem is it’s not because you can start by asking it a focused diagnostic question, but because of its capabilities, it will have a conversation with you that includes other things that get involved in the conversation, like the almost like an ideal doctor in the sense that it’s willing to have a long conversation with you.

Isaac Kohane: And so it can go everywhere. And so how do you do a trial of that? I mean, the only trial you could conceive of would be a trial, like if you compared a hospital to another hospital, a health care system with it, health care system without it. That’s going to take a lot of patience until across all diseases, you know, it’s actually going to have significant impact or not. And so the other model that we have is the training model. We make doctors, because they’re general purpose as well, go through a lot of hoops, learn things that they may not ever use, like calculus, just to show us that they can do it. And then we have them take exams and have them go into rotations and show that even with a lack of sleep, they’re still reasonably polite and thoughtful and so on. And I noted that all these large language models in healthcare are successively improving their performance on all these formal tests. And I’m sure the Med-PaLM team told you about how they’re doing wonderfully on a variety of examinations that doctors normally take.

Harry Glorikian: Yes.

Isaac Kohane: So GPT, all these programs are getting better. But here’s the problem. I have some rough idea of what it means for a human being to have passed these tests. But I don’t know what it means for GPT to have passed the test. It means it’s a great test passer, but it doesn’t walk, talk and chew gum in the clinic right now. And so I have no idea what it means and it’s very hard to predict. And so the conclusion of, my conclusion there was that until we figure this out, it’s going to have to be part of a team, that you have to have a doctor in the loop. And because it can’t work by itself, we have to have a human loop, which is not easy, by the way, because we’re all lazy, fundamentally, as human beings, or we habituate. And so if I get expert advice and, you know, nine times out of ten, it’s been great, I may sort of fall asleep at the wheel metaphorically for the 10th time and see that it actually didn’t do the right thing. So the human’s going to have to be in the loop and that’s going to be challenged. And I note that Tesla has actually embraced this challenge because of course it’s moving towards self-driving using a variety of technologies. But A) it doesn’t want people to kill themselves when it makes a mistake, and B) it doesn’t want a liability either. And so what happens, and I have a Tesla so I can relate, I’m not the best driver, unfortunately, which is probably why I decided to have a Tesla. And um, if you pick up your phone, my children want to kill me when I do that. When you’re driving, if you hold the phone for too long, it’s looking at you and it starts saying, you know, pay attention and it’s not just moving the wheel. It wants you to pay attention. And if you don’t pay attention, if you don’t put down that phone, it will say, I’m stopping autopilot and you’re on your own and you have to start driving. And it gets even punitive. Like if it gives you five chances and if you use up your five chances, it switches it off for a long time, I think, until the next update. So we may have to put in those kinds of, um, security checks for doctors to make sure that they don’t become complacent in the use of these models. But. At the same time, it’s very clear to me, and we can get into it, that it can really dramatically change the overall performance of clinicians in health care.

Isaac Kohane: The third model I mentioned was a torchbearer, which is superhuman performance, and I gave examples. Again, this is not unique to GPT-4. They’re all heading in the same direction. I gave an example of a patient who’s been undiagnosed, has atypical presentations, and is doing very poorly. Multiple doctors have not been able to diagnose them, even in this network that I’m a part of, the Undiagnosed Disease Network, we were unable to diagnose it after genomic sequencing. So I gave GPT-4 the five genes that had bona fide mutations that should have caused loss of function. I said, Which one of these are responsible for this patient’s disease? And it goes on and says, Well, this is probably the most likely one, even though it’s not typical because of X, Y, and Z reasons. And then it goes down and says these are less likely ones. It turns out that one it picked was the one that we figured out was in fact the correct one after much bench research. But the point I’m making is that the torch bearer, sort of like a Dr. House person, has to be part of a team. We can’t let Dr. House go rogue. They have to be part of a team. We have to be checking it. And that’s the way forward. So I’ll stop there because maybe you want to push this discussion one way or another.

Harry Glorikian: Well, it’s interesting, right? Because, you know, when I when you look at what, Google hasn’t released MedPaLM-2 to the public yet and they’re releasing it gradually to professionals in part you know, from what I got was they want to make sure there are some responsible safeguards put around it, you know, put around the technology first. But if GPT-4 is just as good at answering medical questions, then isn’t the cat sort of out of the bag already?

Isaac Kohane: And that’s why the cat is absolutely out of the bag. The horse is out of the barn. The train has left the station. All those metaphors are trying. And I have to say, I have to say I’m delighted on two counts. I’m delighted because it’s enormously reassuring that we’re not heading to what I was worried about would be a recurrence of the winter that happened after I did my PhD in the 80s where we really, because we have in six months 100 million people using this intelligence for a variety of purposes, including real medical applications and delivering some value. It’s imperfect in a variety of ways, but it’s delivering real value today. And so the fact that it survived a test, not a demonstration, but of letting it out into the dirty, dirty wild is incredibly reassuring that this thing is providing something that’s useful. And we’ll get into why that is.

Harry Glorikian: Yeah.

Isaac Kohane: The second reason…

Harry Glorikian: I’m sorry. Go ahead.

Isaac Kohane: The second reason I am delighted is because the medical establishment is very slow to adopt technologies, especially when it does not increase revenues. So we adopted electronic health records fast after 2009 because there was a huge incentive plan A) from the Obama administration and B) because electronic health records became vehicles for more effective billing. But merely providing improved decisions, yes you would think that that’s important to the medical system. Well, it’s important to individual doctors and it’s obviously important to patients. But if it causes a more detailed conversation with the patient, if it causes a doctor under a fee for service system to not do a procedure that they would have otherwise done—remember, the more we do in medicine under fee for service, the more we get paid. And so the fact that we have medical expertise now in the wild, I think is going to be an enormous cudgel to make the medical establishment have to think how it’s going to, in fact, use it. And furthermore, the fact that it’s been liberated in this particular way means that even though, predictably the electronic health record companies are going to be working with Google and Microsoft and all the other vendors of large language models in healthcare. There will also be plenty of sort of direct use by doctors, by patients of the large language models in healthcare, not through the enterprise systems.

Harry Glorikian: Yeah, you well, and that’s….

Isaac Kohane: And that’s a good thing.

Harry Glorikian: Well because if I go back, like in the prologue of your book you gave a fictional scenario about, I think it was a second year medical resident who’s dealing with several different patients with hard to solve medical conditions. And this person decides to consult GPT-4 about almost every decision. And it’s really interesting reading. And if you read it, it sounds a little bit like science fiction, right? But you emphasize that nothing in the story is beyond the actual capabilities of GPT-4 right now. So, I mean. What’s your impression of how widely our doctors and or other health care professionals are already using this or other models? And what do they do they, what are their main things they’re using it for? So I’m just wondering.

Isaac Kohane: So. Well, I think everybody uses it in the way that it fills a specific need. And so I think it’s good to look a little bit back because the team that from Google that you interviewed are top notch, wonderful people. Truly wonderful people. Great scientists, computer scientists, even though they’re, I think, a little bit modest. But their company, Google, has a search engine that’s been used now for many, many years by patients to get answers. You know, it’s often referred to as Dr. Google. And that is knowing that a lot of the answers you get are ads that are not quite answering the question. And furthermore, many of the sites that it points to have high reputation but may not be right. They have a reputation because it’s some fad or another. And so it’s ironic that we’re letting that happen. And yet we’re concerned about something that actually is much less risky, in my opinion, in terms of medical advice. These large language models in healthcare actually do a pretty good job. They do hallucinate, they do make mistakes, but for the moment. And they could be aligned to be more advertiser friendly. They could be aligned to go with the wrong medicine and so on. But right now, because the way they’ve been trained and the way they’ve been aligned, they seem to provide much better medical advice.

Isaac Kohane: But how is it being used? Patients, because primary care is dying in the United States, and you go to see your doctor, and if you have an acute problem, it’s hard to see them right away. You may have to wait months. Having someone be able to answer your questions knowledgeably is incredibly valuable. And so that’s how patients are using it. You went to visit your doctor and you have a lot of questions afterwards, and they’re not there to answer the questions now that you thought of them after the visit. That’s incredibly valuable. So that’s the way patients are using it. Doctors, some doctors I know, some doctors who are using it to answer some questions, review cases. That’s not the main use. The main use that doctors are using for today is to get through the unpleasant bureaucracy of medicine in order to or, you know, we have to remember that 30% of the cost of healthcare in the United States is administrative overhead. So this beast, which is one fifth of the US economy, 30% of that is administrative overhead. So what does the administrative overhead look like? It looks like getting authorization to do a procedure for a patient. Authorization to get a referral decision to reimburse at a certain level. Work to decide what billing level to send to the insurer and communications back and forth. All that stuff is what GPT and the other large language models in healthcare are being used for. So somewhat, I mean, it was both amusing, but frankly a little bit alarming to see doctors tweeting out back in January with ChatGPT. And remember, ChatGPT is not, back in January it was not GPT-4. It was GPT 3.5.

Harry Glorikian: Right.

Harry Glorikian: And you see doctors tweeting out, hey, I pasted in the patient summary into GPT and I told it to write a letter of prior authorization. This is a letter giving the rationale for how you, what you plan to do to the patient. So that takes at least five minutes and you have to think and that’s like that’s effort. And this was just writing the note in two seconds. And there were lots of examples like that. So the way doctors are using it right now and by the way, the reason I said it’s it was a little bit hair-raising is because it’s not apparent to everybody, but there’s actually competing products from OpenAI and Microsoft, even though Microsoft is a major investor, the major investor in OpenAI, there are there are products like GPT, which are directly managed out and are products of OpenAI. Then there are products that are powered by GPT-4, which are available through the Azure cloud through Microsoft. And why am I making that distinction? Because Microsoft has a lot of health care partners already for a variety of healthcare related businesses. So regulations like HIPAA, where you’re supposed to maintain a certain level of data security and privacy around patient data are, in fact, enforced on their cloud. That’s not true of ChatGPT because it’s not HIPAA compliant. So technically, doctors might be in violation of HIPAA by putting private patient information that gets sent to OpenAI that does not have any particular obligations to follow HIPAA rules. So but in any case, that was a tangent just or a digression, just to say that even though it was not compliant, the benefit was so large that doctors would just start using it. And more broadly, the first thing I think that healthcare systems will be doing with these programs more than likely is not actually using them for decision support, but to improve administrative processes. Right now they have basements full of people looking at the record after patient as a patient, after the patient is admitted, deciding how to bill or different procedures. And after a doctor sees a patient in an outpatient setting, that basement of people can be replaced now by just a program looking at the record and deciding. And it can be tuned in a variety of ways by the hospitals. On the other side, insurance companies historically have hired a lot of clinicians, nurses and doctors to overlook claims and say, does that make sense? That’s now gone.

Harry Glorikian: So what you’re saying is there’s one system that is going to throw the ball to the other system and it’s going to go back and forth until it rationalizes.

Isaac Kohane: That’s exactly. That’s actually…that’s incredibly insightful because the absurdity is going to be, it’s going to be the information war that we’ve had forever between clinicians billing and insurers paying that information. War, which has gone on for decades now, is going to be run by AIs and they’ll be generating text on both sides, but they don’t actually need the text just for the human beings. So A) the text is going to go away eventually and B) the real tradeoffs are going to be very explicit and both the business opportunities and the policy opportunities will be clear. What do we actually want to pay for? Rather than use all the things to slow payments and so on.

Harry Glorikian: That’s going to be an interesting evolution of this that I actually hadn’t….I had been on top of the billing stuff or asking for the authorizations because like, as soon as I saw that first tweet, I forwarded that to all my physician friends and was like, Look, look at this. This might save you a little bit of time, but the fact that it could now formulate on its own. And then the other side could formulate the response. I mean. Well, I mean, the good part is, they should be able to do it very quickly and come to a conclusion. Compared to humans doing it.

Isaac Kohane: But it’s,because it’s going to become so, it’s like this information war is now going to become a war that’s happening in seconds rather than in days and weeks. And so that’s going to change the business. And also just the cost, the administrative cost is going to go down. Now, I wish I could say that all those savings will go to patients or to the government that pays for health care. Unlikely. But there are going to be huge savings. And the reason why I’m saying it’s going to happen there first is, billions and billions of dollars are at stake. These administrative costs are 30% of one fifth of the economy. You’re going to make money right away. And this is not, I’m not saying that I’m thrilled by it because I really am focused on the clinical mission. That’s what I’ve trained for. That’s what I’m passionate about. But I think it would be silly if we don’t realize that. And that aspect, by the way, is quite ably summarized by Peter Lee in a chapter he calls, I think, “The Paper Shredder,” where he talks about the ways this can do administrative efficiencies.

Harry Glorikian: So, all right. Jumping back to, because you said a little earlier, like I think in the book, you said bluntly, it should not be used in medical settings without human supervision, as you mentioned earlier in our discussion. How do you think we’re going to get to a future where you would be comfortable letting this, you know, be used on a regular basis. I know that, you know, we have to have the human in the loop, right, so that the thing doesn’t say something harmful or get it wrong. But have you thought about how that might be structured?

Isaac Kohane: Yeah. Don’t have to think too hard because I see business plans around it and I don’t want to sound like a shill for Microsoft. So I can say I wrote in the New England Journal of Medicine with colleagues from Google back in 2019, I wrote an article and one of the biggest uses I thought of AI from the get go was listening in on the clinic visit and based on the past clinic notes and based on what was being said in that clinic visit, to be able to generate a clinic note that then could be repurposed, sent to the insurer, sent to the referring doctor, sent to the patient. And by doing that, reverse some of the heartache caused by electronic health records which have turned doctors into documentation clerks and allowing them to, instead of staring half the visit at the computer screen, to actually look at the patient, ask them questions and not have to worry about, ah, am I getting this all down? Because instead having multi voice recognition and having a summary that I think so now and I, I the reason I mentioned that I wrote that with Jeff Dean from Google is so that I don’t sound like a shill when I say I’m pretty aware that Microsoft acquired Nuance and I am pretty sure that one of their products is exactly to do that for clinicians.

Isaac Kohane: And so that’s…and that may sound like a non-clinical application, but in fact, not only does it improve the patient experience and the doctor experience, but having an accurate summary of what’s going on in the clinic at which doctors, even when they’re writing it down, they forget things. AI makes for a much better note, so you have much better decision making. It also creates an opportunity for the program now to say what’s missing and to like send an email afterwards to the doctor. By the way, I didn’t want to break your flow, but you actually forgot to, these medications interact and so it creates without screwing up the workflow. In fact, by even easing the workflow, it gives an opportunity for much better quality information. And on the basis of that high quality information for this agent to look over your shoulder and give you advice that you can decide to ignore or not. And what’s more, these, because these large language models in healthcare are conversational and they seem to have notions that we have to explore of priority and so on. Unlike a system which just generates 1000 alerts and you just get alert fatigue, it’s actually a conversation where the important things come up.

Harry Glorikian: Right? No, I mean, you know, when I’m doing my meetings, I do, you know, I’ll use summarization or recording where it summarizes it and it’s incredibly helpful. And then you automate it and it takes it and puts it into your spreadsheet so that you can track it later. I mean, it’s, you know, the level of automation that you can do so that you don’t forget these things is unbelievable.

Isaac Kohane: It is unbelievable. And I think the third author, Carey Goldberg, who is a really accomplished reporter, was the Moscow bureau chief during a period which, interestingly, she analogizes to the period that we have now where the future is hard to see because of this disruptive change, because of these large language models in healthcare. And she said the fall of the Soviet Union when she was covering that’s when she was there, the fall. You didn’t know what the next year was going to look like. But she’s been a reporter for The New York Times, for LA Times, for she was Boston bureau chief for Bloomberg. But she said that, I think, she says in her chapter about the patient perspective that the day may come soon, when it’s no longer acceptable for doctors to actually work without this. So I’m looking over your shoulder.

Harry Glorikian: That brings me to the next. So I was thinking about this and I was like, I mean, in the book and throughout the book, you know, there’s several examples of where GPT-4 helped with legitimately tough diagnoses, as we said earlier, like one in a 100,000 disease. And, you know, another one was completely unique, like 1 in 1,000,000, If I remember it correctly.

Isaac Kohane: That’s correct.

Harry Glorikian: Given its powers, it seems likely that many doctors with tough cases or maybe many patients or their families will want to consult the models now, long before there are formal rules, regulations, guidance for their use in the medical setting. I mean, I almost want to say, wouldn’t it be hard to build a moral or ethical argument that if you don’t consult one of these models you’re failing to use one of the most powerful tools now at your disposal. I mean, now keeping in mind that it’s still just a tool, but can you imagine a near future situation where failure to consult an AI can be something like grounds for a malpractice suit. I don’t know. I’m just thinking out loud.

Isaac Kohane: So thinking out loud, thinking as a doctor who loves our medical system, but also has had loved ones and has myself. We’ve all been patients and had loved ones as patients. If you have a loved one who’s in a tricky spot, an important decision has to be made. I would say, get all the information, give it to GPT-4 and say, Is this a reasonable thing to do? What are the other things that could be done? If it doesn’t say anything that makes you think that something else should be done, then fine. If it does say something should be done, ask yourself, Is this plausible? Does it pass common sense? And if it does, I think you should go talk to the doctor. So we’re not disintermediating the doctor. But this is essentially an instant second opinion. So if I would say today, if you have a loved one in a sticky spot, I would use this personally, too, because I can tell you as a doctor, expert in a lot of different parts of medicine, just looking, reading the literature is an impossible task for me to figure out what is actually reasonable. So this actually distills it in a way that’s quite remarkable. I’m not saying it distills it right, but then you can go and talk to an expert like your doctor and say, why don’t we do this? And if he or she gives a reasonable response, then fine.

Isaac Kohane: So that’s one side of it. And the other side of it is I think it will become at some point a medical legal liability to not use this, at least not having worked in the background, looking at every case. Is this something we missed? Is this patient, should this patient be on this medication? Do these medications, actually, are incompatible in some way? And the reality is, we know from multiple studies that doctors make, because they’re human beings, make incredible numbers of errors. So just having it as a good error catcher is incredibly important. I think the third most that we have, there are reports, I think this is an overestimate but nonetheless it says that, by the National Academy of Medicine, that I believe the third largest cause, the third most frequent cause of death, of avoidable death in hospitalized patients are avoidable errors. And even if that were true, only 10%, that’s still thousands of deaths that could be avoided by catching errors. So I think that Carey Goldberg’s intuition is in the end going to pan out because of us now, it’s not a theoretical thing. We have the capability to have a fallible but usually very expert system that has an opinion. We don’t have to agree with it all the time. We just have to check.

Harry Glorikian: Yeah. And I mean, sometimes it’s, you know, maybe the physician doesn’t know all the medications because another doctor prescribed it. But the system that has access to that data can, hey, you know, I don’t want to say raise its hand, but essentially say, hey, by the way, you know, this is, you may not have known this, but you’re going to have a bad reaction with these two meds because this patient is on this other med that you may not have known of.

Isaac Kohane: And again, I want to point out that we have to compare this not to some theoretical, theoretical utopian health care system that does not exist, but to the American health care system where we’re already missing tens of thousands of primary care doctors. I think I even wrote in the book, I don’t know if I said it explicitly, but I have faculty, as the department chair. I get these wonderful new faculty who come from different parts of the world and they say, Oh, where should I get my primary care? So I called all my friends who are in primary care practices. Their practices were full. So then I asked another one of my friends who has a very elegant practice out of Mass General Hospital. And I said, Could they see you? He said, Well, I’m retiring. And I said, Well, who should they see? And he said, Honestly, I don’t know of any good practices that are open right now. So this is Boston where we have so many doctors. And yet, unless you’re willing to pay crazy dollars for concierge service. There’s no primary care and it’s going to get worse because we’re all getting older as a society. And there’s very few incentives for doctors with debts and appealing other jobs to go into primary care. So primary care is absolutely dying. It’s also dying even in socialized medicine economies. There are strikes in France because the doctors feel overstretched, because there’s not enough of them to carry all the different burdens. That’s certainly also true in England and in countries like China. There’s essentially no primary care ever. And as a patient, you have to decide which specialist to see. So the reality is consumers are on their own about when they should see the doctor, which doctor and so on. And that may sound harsh because a lot of us are plugged in, a lot of us giving these interviews are plugged into the healthcare system where people are not, that is truly the case. And so then that goes back to explaining why Dr. Google, flawed though it is, is still used so extensively and why is it really a leg up to have these decision support systems at least as an adjunct to human decision making.

Harry Glorikian: So that brings me to, there’s a part in a recent talk of yours, I believe, I was there, you said, if we want to avoid the political alignment that sends healthcare AI to a very bad place, we need to “flip the clinic.” So what do you mean by that when you’re saying it? What would it mean or what would it look like to actually do that, to flip the clinic?

Isaac Kohane: Well, thanks for asking that question. What it means is, first of all, primary care is important. But the reality is, if you’re healthy, you don’t actually have to go to a doctor’s visit to get your shots or to get your blood pressure checked and so on. And that’s one visit a year. And we know from multiple studies, of course, your blood pressure is not what you’re going to be measured at that visit. We know that your activity level is incredibly important to your health, and that’s not going to be measured in the visit. We know that your diet is an incredible predictor of many, many health risks. And that also is not going to be measured in the primary care clinic. And so to flip clinics means to have, controlled by you, observations that are relevant to health in your day to day life and that the things that need to be done routinely for screening can just be done in basically any store, in a CVS, in a CVS pharmacy, in a Walmart and so on. But when things start going badly, when you actually need a doctor, you should be told, go, go see a doctor. This is not looking good. But until that point, that point, you should be given feedback. You know, your diet could be improved, your exercise. Now, not everybody wants to listen, but not everybody listens to a primary care doctor either. But here, recognizing that most of the relevant data is in your daily life but not forgetting. But you see, there’s a lot of quantified life companies that just look at your activity from your Fitbit or your blood pressure. That is not very meaningful if you don’t have all your clinical data, right, because that actually puts into place. So having the combination of your clinical data for the few times where you are in a hospital or a clinical setting combined with your daily data there, by flipping the clinic, you can actually bring in decision support and interventions when they are needed by the patient rather than going this through this Kabuki theater where you have to go and show up at a clinic visit. Very limited value there and doesn’t help you when the things start hitting the fan or when it looks like things are going to hit the fan because you’re going to wait for a whole year when in fact, if you looked at your direction of your weight or your direction of your heart rate or many other factors, you could see that you’re heading towards trouble.

Harry Glorikian: You almost want a, it’s almost like we all need our personal dashboard that we can look at, you know, regularly so that you can I mean, if you see the trend line going in the wrong direction, you, you know, and a few indicators you may want to I mean, nobody drives a car without a dashboard.

Isaac Kohane: That’s right. You need a dashboard that does know more about you. Like not only is that your heart rate, but it knows that you’re a patient with heart failure and that or you’re a patient with diabetes. It knows your full clinical history. Yes, we need a smart dashboard, which then allows me to say an old man thing. When I was a grad student, right after I was a grad student, my thesis advisor, Peter Solovitch, is a professor at MIT, had this idea called Guardian Angel, and I helped him develop this idea and it was so far back, it was in 1994 that we were able to register a domain guardianangel.org. And if you follow GA.org I’ll link it to some MIT site where you can see a scenario that I wrote back in 1994 about exactly this kind of flipped clinic.

Harry Glorikian: It was great having you on the show and I look forward to continuing the conversation hopefully in the future because I have a feeling this is going to, every 3 or 4 months we’re going to be saying, wow, and something new is going to be, you know, creeping up that we just didn’t expect.

Isaac Kohane: I totally agree. What a fun time to be alive.

Harry Glorikian: Have a wonderful day or wonderful afternoon.

Isaac Kohane: Thank you.

Harry Glorikian: Take care. Bye bye.

Debunking large language models in healthcare with Isaac Kohane

EPISODE HIGHLIGHTS