Thinking Machines Just Built an AI That Talks and Listens at the Same Time

Watch a simultaneous translator at the UN. They listen to one language and speak another in the same second. The delay is so short the room forgets they are there.

No AI has ever done that (at least not like this). Until now.

Thinking Machines released a research preview of a model that listens, watches, and speaks at the same time. A second AI runs reasoning in the background and feeds answers in without breaking the thread.

Nine days earlier, Anthropic shipped three features that nobody outside the AI press paid attention to. One lets the AI organize what it learned overnight. One has it check its own work against a rubric. One lets a lead AI break a job into pieces and hand them to specialist AIs.

Piece of the puzzle slowly coming together. Shipped days apart. And the conversation everyone loves having is about benchmark scores.

That is the wrong conversation.

What Thinking Machines shipped

The model is called TML-Interaction-Small. Research preview. Wider release later this year.

Skip the name. The world of AI is horrible at naming. What it does is the story. It processes audio, video, and text in chunks of about a fifth of a second, all three at once. It can interrupt you. It can be interrupted. It can watch you while it talks to you. A second AI handles longer reasoning in the background and feeds answers back in without dropping the conversation.

The team had to write two of their own benchmarks because nothing else measured what the model does.

Mira Murati runs Thinking Machines. She used to be Chief Technology Officer at OpenAI. Her framing on the release was that how we work with AI matters as much as how smart it is.

She is right. And the implication is bigger than her own demo.

Today’s voice AI fakes real-time conversation by stitching pieces together around a model that was never built for it. Voice detection. Turn prediction. A model that waits for silence and then talks. Thinking Machines threw all of that out and trained the substrate from scratch. The interactivity is in the model now. Not bolted on around it.

That is a significant change.

What it means for the layer I’ve been writing about

I argue in my book that what makes an AI assistant powerful is not the chat window. It is what happens behind it. Memory. Action. Audit trail. Off switch. I stand by that.

But the chat window still must disappear.

Human intent does not arrive in finished sentences. It arrives mid-thought, mid-gesture, mid-glance. If the AI underneath only accepts polished prompts, the human is still doing the translation work the system was supposed to remove. The current thing is a “very fast assistant.” It cannot become ambient. It cannot watch.

Thinking Machines just proved the substrate can be built.

What is built and what is deployable are different problems. I use four questions in the book to test any AI assistant claim. Remember. Act. Show. Stop. Can it remember you across sessions. Can it act across your systems. Can it show you what it did. Can you stop it.

By that test, Thinking Machines built a foundation. Not a finished product. Their own announcement flags long memory as the open problem. Action is limited. The audit trail and the off switch are not addressed.

The race is whether you can put the whole thing together. Not which piece is best.

The model race did not end. It’s evolving and moving.

A week ago, I said the model is becoming a commodity. Half right.

Stanford released its 2026 AI Index on April 13. The top six labs in the world sit inside 80 Elo points on Arena. Anthropic at 1,503. xAI at 1,495. Google at 1,494. OpenAI at 1,481. Alibaba at 1,449. DeepSeek at 1,424. The top American model leads the top Chinese model by 2.7 percent. A year ago, the gap was 17 to 31 percent. Stanford’s own analysts said the competition has moved to cost, reliability, and industry-specific performance.

That is not the same as saying the model is done.

The work that is left is the work no public leaderboard measures. Does the AI tell the truth about things it does not know. Does it finish a long task without losing the thread. Does it pick the right tool. Does it understand a clinician’s follow-up.

Mercor lives in that gap. They raised at a $10 billion valuation in October. They pay out more than $1.5 million a day to doctors, lawyers, engineers, and scientists who write the tests, build the practice environments, and grade the work the AI labs use to fix the long tail. They do not sell AI. They sell the inputs that make AI usable in your industry.

That is also part of the assembly.

“Good enough” stopped being a benchmark score. It is whether a clinician, an analyst, a lawyer trusts the output and uses it again the next day. Most conversations are not there.

The compute question

On May 6, Anthropic signed a deal with Elon Musk’s SpaceX to take over a building in Memphis called Colossus 1. 300 megawatts. Over 220,000 high-end Nvidia chips. Within a month, every chip in that building runs Claude.

The day the deal closed, Anthropic doubled Claude Code’s rate limits, removed peak-hour throttling on Pro and Max, and raised Opus API limits. The product was being throttled by the cluster. The bottleneck was not the model. It was the wall socket.

That is a very important lesson.

An AI assistant that runs in the background of your life is not a chatbot you open once a day. It listens. Watches. Consolidates what it learned overnight. Grades its own work. Calls tools. Sends specialist AIs to handle pieces in parallel. Every layer takes compute. Multiply that across millions of users and the workload stops looking anything like today’s chat traffic.

That is why Anthropic just paid for 300 megawatts from the company that runs a competitive system. It is also why they locked up roughly 5 gigawatts each from Amazon and from Google plus Broadcom, $30 billion of Azure capacity from Microsoft, and a $50 billion deal with Fluidstack. Nobody books that much if the model is the only thing that matters.

One important detail if you read the fine print.

Musk’s contract with Anthropic includes a “Humanity Clause.” If Musk decides Claude is “engaging in actions that harm humanity,” he can pull the compute back. He defines what that means.

Anthropic rented a critical piece of its production infrastructure from someone who reserves the right to unplug it. The compute layer became a governance layer this month. The chess game is fascinating if you watch closely.

World models

Two of the most respected AI researchers raised over $2 billion combined earlier this year.

Fei-Fei Li raised about $1 billion in February for World Labs at a $5 billion valuation. She taught at Stanford and helped kick off the modern AI era. Yann LeCun raised $1.03 billion in March for AMI Labs at $3.5 billion pre-money. Largest seed round in European history. He won the equivalent of the Nobel Prize for AI in 2018 and ran Meta’s research lab for 12 years.

Both are betting today’s AI is not the path to real intelligence.

LeCun’s public quote was that the path to superintelligence through today’s AI “is complete bullshit. It is just never going to work.”

I do not buy the dead-end framing. Today’s AI is not a dead end. It is a different road. The honest read from inside the field is that the next generation will use both. Today’s AI for language and reasoning. World models for physical space and cause-and-effect.

The part that matters for the layer I wrote about in the book is the second one.

A world model gives an AI assistant something today’s systems cannot have. A grasp of physical reality. What happens if you push the cup off the table. What happens if you take this turn at this speed. What happens if you give this drug to this patient with these other conditions.

That is the missing input for an AI assistant that operates in the exam room. On the factory floor. In the back of an ambulance. AMI Labs’ first commercial partner is Nabla, a healthcare AI company. Not an accident. In medicine, a confidently wrong answer can kill someone.

LeBrun, the AMI CEO, has been clear. First commercial products about a year out. Real applications three to five years from now. That is long enough that most AI strategies being signed this quarter will not plan for them. Short enough that they land before your next strategy refresh.

What this means for organizations

The benchmark race is over, and it was never the race. The assembly is the race. It’s the pieces working together that matter.

A model that is reliable when real people use it for real work. A front end that perceives continuously instead of waiting for a typed prompt. A back end that remembers, checks its own work, coordinates, and can be audited. Enough compute to run all of it.

The smartest model is not going to take the marbles. The right assembly is.

Somebody at every AI company is making those calls. Which piece first. Which piece to defer. Which to build, which to buy, which to rent from a competitor. Those are strategy decisions, not engineering decisions. They are going to decide who wins adoption.

Find that person. Sometimes it is the CEO. Sometimes it is the chief scientist. Sometimes it is a chief product officer most of the market has never heard of. The title does not matter. What matters is who is driving the assembly and whether they understand how to bring something useful to a real user. Ask who that person is at the company whose product you are about to depend on. Ask what they decided to ship first. Ask what they decided to leave for later. Ask what they would do differently if they were starting today. The answers will tell you whether you are buying a product or a press release.

This thing is moving faster than anything I have seen. The PC took fifteen years to remake business. The internet took ten. Mobile took five. This is rewriting itself every nine days. I am writing this piece because Thinking Machines and Anthropic shipped two halves of the same machine inside a week and a half, and most people I talk to have not noticed.

The companies whose driver is moving at that speed will win the next decade. The ones whose driver is still picking by benchmark are going to wake up to a market that left them behind.

Own the default, own the data. Own the data, own the decade.

Harry Glorikian is the author of The Invisible Interface: How AI Turns Intentions Into Actions, And Who Wins (Ideapress Publishing, distributed by Simon & Schuster, June 2026).

Thinking Machines Just Built an AI That Talks and Listens at the Same Time

Related Posts