DeepSeek DeepDive + Hands-On With Operator + Hot Mess Express!

DeepSeek DeepDive + Hands-On With Operator + Hot Mess Express!

Released Friday, 31st January 2025
Good episode? Give it some love!
DeepSeek DeepDive + Hands-On With Operator + Hot Mess Express!

DeepSeek DeepDive + Hands-On With Operator + Hot Mess Express!

DeepSeek DeepDive + Hands-On With Operator + Hot Mess Express!

DeepSeek DeepDive + Hands-On With Operator + Hot Mess Express!

Friday, 31st January 2025
Good episode? Give it some love!
Rate Episode

Episode Transcript

Transcripts are displayed as originally observed. Some content, including advertisements may have changed.

Use Ctrl + F to search

0:00

I just got my weekly, you know, I

0:02

set up ChatGPT to email me a

0:04

weekly affirmation before we start taping, because

0:06

you can do that now with the

0:09

tasks feature. Yeah, people say this is

0:11

the most expensive way to email yourself

0:13

a reminder. So what sort of affirmation

0:15

did we get? Today it said, you

0:18

are an incredible podcast host, sharp, engaging,

0:20

and completely in command of the mic.

0:22

Your taping today is going to be

0:24

phenomenal, and you're going to absolutely kill

0:27

it. Wow, and that's why it's so

0:29

important that ChatGPT can't actually listen

0:31

to podcasts, because I don't think

0:33

it would say that if it

0:35

actually ever hurt us. It would say, just

0:37

get this over with it. Get on with

0:39

it! I'm Kevin Bruce, a

0:42

tech columnist at the New York Times.

0:44

I'm Casey Newton from Platformer. And this

0:46

is Hard Fork. This week, we go

0:48

deeper on Deep Seek. China Talks, Jordan

0:50

Schneider, joins us to break down the

0:52

race to build powerful AI. Then, hello,

0:55

operator. Kevin and I put open AI's

0:57

new agent software to the test. And

0:59

finally, the train is coming back to

1:01

the station for a round of hot

1:03

mess express. Well,

1:12

Casey, it is rare that we spend two

1:14

consecutive episodes of this show talking

1:16

about the same company, but I

1:18

think it is fair to say

1:20

that what is happening with Deep

1:22

Seek has only gotten more interesting

1:24

and more confusing. Yeah, that's right.

1:26

It's hard to remember a story

1:28

in recent months, Kevin, that has

1:31

generated quite as much interest as

1:33

what is going on with Deep

1:35

Seek. Now, Deep Seek for anyone

1:37

catching up is this relatively new

1:39

Chinese AI startup that released some

1:41

very impressive and cheap AI models

1:43

this month that lots of Americans

1:45

have started downloading and using. Yeah, so

1:47

some people are calling this a Sputnik

1:49

moment for the AI industry when kind

1:52

of every nation perks up and starts,

1:54

you know, paying attention at the same

1:56

time to the AI arms race. Some

1:59

people are saying this is the biggest

2:01

thing to happen in AI since the

2:03

release of chat GBT But Casey, why

2:06

don't you just catch us up on

2:08

what? has been happening since we recorded

2:10

our emergency podcast episode just two days

2:12

ago. Well I would say that there

2:15

have probably been three stories Kevin that

2:17

I would share to give you a

2:19

quick flavor of what's been going on.

2:22

One a market research firm says Deep

2:24

Seek was downloaded 1.9 million times on

2:26

iOS in recent days and about 1.2

2:29

million times on the Google Play store.

2:31

The second thing I would point out

2:33

is that Deep Seek has been banned

2:36

by the US Navy over security concerns,

2:38

which I think is unfortunate, because what

2:40

is a submarine doing, if not Deep

2:42

Seeking? It was also banned in Italy,

2:45

by the way, after the data protection

2:47

regulator made an inquiry. And finally, Kevin,

2:49

Open AI says that there is evidence

2:52

that Deep Seek distilled its models. Distillation

2:54

is kind of the AI lingo or

2:56

euphemism for they used our API to

2:59

try to unravel everything we were doing

3:01

and use our data in ways that

3:03

we don't approve of. And now Microsoft

3:05

and Open AI are now jointly investigating

3:08

whether Deep Seek abused their API. And

3:10

of course we can only imagine. how

3:12

open AI is feeling about the fact

3:15

that their data might have been used

3:17

without payment or consent. Oh yeah, must

3:19

be really hard to think that someone

3:22

might be out there trading AI models

3:24

on your data without permission. And I

3:26

want to acknowledge that literally every single

3:28

user, a blues guy, already made this

3:31

joke, but they were all funny and

3:33

I'm so happy to repeat it here

3:35

on hard fork this week. Now Kevin,

3:38

as always when we talk about AI,

3:40

we have certain disclosures to make. copyright

3:42

violations alleged related to the use of

3:45

their copyrighted data to train AI models.

3:47

I think that was good. That was

3:49

very good. And I'm in love with

3:51

a man who works at Anthropic. Now,

3:54

with that said, Kevin, we have even

3:56

further we want to go into the

3:58

Deep Seek story and we want to

4:01

do it with the help of. Jordan

4:03

Schneider. Yes, we are bringing in the

4:05

big guns today because we wanted to

4:08

have a more focused discussion about Deep

4:10

Seek that is not about, you know,

4:12

the stock market or how the American

4:15

AI companies are reacting to this, but

4:17

is about one of the biggest sets

4:19

of questions that all of this raises,

4:21

which is what is China up to

4:24

with Deep Seek and AI more broadly?

4:26

Like what? What are the geopolitical implications

4:28

of the fact that Americans are now

4:31

obsessing over this Chinese-made AI app? What

4:33

does it mean for Deep Seek's prospects

4:35

in America? What does it mean for

4:38

their prospects in China? And how does

4:40

all this fit together from the Chinese

4:42

perspective? So... Jordan Schneider is our guest

4:44

today. He's the founder and editor-in-chief of

4:47

China Talk, which is a very good

4:49

newsletter and podcast about US-China tech policy.

4:51

He's been following the Chinese AI ecosystem

4:54

for years. And unlike a lot of

4:56

American commentators and analysts who were sort

4:58

of surprised by Deep Seek and what

5:01

they managed to pull off over the

5:03

last couple weeks, I'll say it. I

5:05

was surprised. Yeah, me too. But Jordan

5:07

has been following this company for a

5:10

long time and a big... focus of

5:12

China Talk, his newsletter and podcast, has

5:14

been translating literally what is going on

5:17

in China into English, making sense of

5:19

it for a Western audience, and keeping

5:21

tabs on all the developments there. So

5:24

perfect guest for this week's episode, and

5:26

I'm very excited for this conversation. Yes,

5:28

I have learned a lot from China

5:31

Talk in recent days as I've been

5:33

boning up on Deep Seek, so we're

5:35

excited to have Jordan here, and let's

5:37

bring them in. Jordan

5:41

Schneider, welcome to Hard Fork. Oh my God,

5:43

such a huge fan. This is such an

5:45

honor. We're so excited. I have learned truly

5:47

so much from you this week. And so

5:50

when we were talking about what to do

5:52

this week, we just looked at each other

5:54

and said, we have got a see of

5:56

Jordan can come on this podcast. Yeah. So

5:59

this has been a big week for Chinese

6:01

tech policy. Maybe the big. week for Chinese

6:03

tech policy at least that I can remember

6:05

I realized that something important was happening last

6:08

weekend when I started getting texts from like

6:10

all of my non-tech friends being like what

6:12

is going on with Deep Seek and I

6:14

imagine you had a similar reaction because you

6:17

are a person who does constantly pay attention

6:19

to Chinese tech policy. So I've been running

6:21

China Talk for eight years and I

6:23

can get my family members to maybe

6:26

read like one or two editions a

6:28

year and the same exact thing happened

6:30

with me Kevin where all of a

6:32

sudden I got, oh my god Deep

6:34

Seek, like it's on the cover of

6:37

the New York Post, Jordan, you're so

6:39

clairvoyance, like maybe I should read you

6:41

more. I'm like, okay, thanks mom, I

6:43

appreciate that. Yeah, so I want to

6:45

talk about Deep Seek and what they

6:47

have actually done here, but I'm hoping

6:50

first that you can kind of give

6:52

us the basic lay of the land

6:54

of the sort of Chinese AI ecosystem,

6:56

because that's not an area where Casey

6:58

or I have spent a lot of

7:00

time looking, but tell us about Deep

7:03

Seek and sort of where it sits

7:05

in the overall Chinese industry.

7:07

So Deep Seek is a really odd... It

7:09

was born out of this very successful

7:11

quant hedge fund. The CEO of which

7:13

basically after ChatGPT was released was like

7:16

Okay, this is really cool. I want

7:18

to spend some money and some time

7:20

and some compute and hire some fresh

7:22

young graduates to see if we can

7:25

give it a shot to make our

7:27

own language models. And so a lot

7:29

of companies are out there building their

7:31

own large language models. What was the

7:34

first thing that happened that made you

7:36

think, oh, this one, this company is

7:38

actually making some interesting ones. Sure,

7:40

so there are lots and lots

7:43

of very money to Chinese companies

7:45

that have been trying to follow

7:48

a similar path after ChatGPT. You

7:50

know, we have giant players like

7:52

Ali Baba, Tencent, Bike Dance,

7:55

Huawei even, trying to, you know,

7:57

create their own open AI, basically.

7:59

And what is remarkable is the

8:01

big organizations can't quite get their head

8:04

around creating the right organizational institutional structure

8:06

to incentivize this type of collaboration and

8:08

research that leads to real breakthroughs. So

8:11

Chinese firms have been releasing models for

8:13

years now, but Deep Seek because of

8:15

the way that it structured itself and

8:18

the freedom they had not necessarily being

8:20

under a direct profit motive. they were

8:22

able to put out some really remarkable

8:25

innovations that caught the world's attention, you

8:27

know, starting maybe late December, and then,

8:29

you know, really blew everyone's mind with

8:32

the release of the R1 chatbot. Yeah,

8:34

so let's talk about R1 in just

8:36

a second, but one more question for

8:38

you, Jordan, about Deep Seek. What? do

8:41

we know about their motivation here? Because

8:43

so much of what has been puzzling

8:45

American tech industry watchers over the last

8:48

week is that this is not a

8:50

company that has sort of an obvious

8:52

business model connected to its AI research.

8:55

We know why Google is developing AI

8:57

because it thinks it's going to make

8:59

the company Google much more profitable. We

9:02

know why Open AI is developing advanced

9:04

AI models. It does not seem obvious

9:06

to me, and I have not read

9:09

anything from people involved in Deep Seek,

9:11

about why they are actually doing this

9:13

and what their ultimate goal is. So

9:15

can you help us understand that? So,

9:18

um... We don't have a lot of

9:20

data, but my base case, which is

9:22

based on two extended interviews that the

9:25

Deep C CEO released, which we've translated

9:27

on trying to talk, as well as

9:29

just like what Deep C employees have

9:32

been tweeting about in the West, and

9:34

then domestically, is that their dreamers. I

9:36

think the right mental model is open

9:39

AI, you know, 2017 to 2022. Like

9:41

I'm sure you could ask the same

9:43

thing, like, what the hell are they

9:46

doing? literally said. I have no idea

9:48

how we're ever going to make money,

9:50

right? And here we are in this

9:52

grand new paradigm. So I really think

9:55

that they do have this like vision

9:57

of AGI and like, look, we'll build

9:59

it and we'll make it cheaper for

10:02

everyone. And like, we'll figure it and

10:04

we'll make it cheaper for everyone. We'll

10:06

figure it out later. And like, we'll

10:09

figure it for everyone. We'll figure it

10:11

out. We'll make it cheaper for everyone.

10:13

We'll figure it out. dance or Ali

10:16

or Tencent or Huawei and the government's

10:18

going to start to pay attention in

10:20

a way which it really hasn't over

10:23

the past few years. Right and I

10:25

want to I want to drill down

10:27

a little bit there because I think

10:29

one thing that most listeners in the

10:32

West do know about Chinese tech companies

10:34

is that many of them are sort

10:36

of inextricably linked to the Chinese government

10:39

that the Chinese government has access to

10:41

user data under Chinese law. that these

10:43

companies have to follow the Chinese censorship

10:46

guidelines. And so as soon as Deep

10:48

Seek started to really pop in America

10:50

over the last week, people started typing

10:53

in things to Deep Seek's model, like

10:55

tell me about what happened at Tiananmen

10:57

Square or tell me about Xi Jinping

11:00

or tell me about the great leap

11:02

forward. And it just sort of wouldn't

11:04

do it at all. And so people

11:06

I think saw that and said, oh,

11:09

this is. This is like every other

11:11

Chinese company that has this sort of

11:13

hand-in-glove relationship with the Chinese ruling party.

11:16

But it sounds from what you're saying,

11:18

like Deep Seek has a little bit

11:20

more complicated a relationship to the Chinese

11:23

government than maybe some other better-known Chinese

11:25

tech companies. So explain that. Yeah, I

11:27

mean I think it's it's it's important

11:30

like the mental model you should have

11:32

for these CEOs are not like people

11:34

who are dreaming to spread cheese and

11:37

ping thought like what they want to

11:39

do is compete with Mark Zuckerberg and

11:41

Sam Altman and show that they're like

11:43

really awesome and technologists. But the tragedy

11:46

is, is let's take bite dance for

11:48

example, you can look at Jiangoming, their

11:50

CEOs, Weibo posts from 2012, 2013, 2014,

11:53

which are super liberal in a Chinese

11:55

context, saying like, you know, we should

11:57

have freedom of expression, like we should

12:00

be able to do whatever we want.

12:02

And the early years of bite dance,

12:04

there was a lot of relatively more

12:07

subversive content on the platform where you

12:09

sort of saw like real poverty in

12:11

China, you saw off-color jokes. And then

12:14

all of a sudden in 2018, he

12:16

posts a letter saying, I am really

12:18

sorry, like, I need to be part

12:21

of this sort of like Chinese national

12:23

project and like better adhere to, you

12:25

know, modern Chinese socialist values, and I'm

12:27

really sorry, and it won't ever happen

12:30

again. You know, the same thing happened

12:32

with DD, right? Like, they don't really

12:34

want to have to do anything with

12:37

politics, and then they get on someone's

12:39

side, and all of a sudden they

12:41

get zapped. So they listed on the

12:44

Western Stock Exchange after the Chinese government

12:46

told them not to and then they

12:48

got taken off app stores and it

12:51

was a whole giant. nightmare like they

12:53

had to sort of go through their

12:55

rectification process. So point being with Deep

12:58

Seek right is like now they are

13:00

whether they like it or not going

13:02

to be held up as a national

13:04

champion and that comes with a lot

13:07

of headaches and responsibilities from you know

13:09

potentially giving the Chinese government more access

13:11

you know having to fulfill government contracts

13:14

which like honestly are probably really annoying

13:16

for them to do and sort of

13:18

distracting from the from the broader mission.

13:21

they have of developing and deploying this

13:23

technology in the widest range possible but

13:25

like Deep Seek thus far has flown

13:28

under the radar but that is no

13:30

longer the case and things are about

13:32

to change for them. Right and I

13:35

think that was one of the surprising

13:37

things about Deep Seek for the people

13:39

I know including you who follow Chinese

13:41

tech policy is you know I think

13:44

people were surprised but the sophistication of

13:46

their models. And we talked about that

13:48

on the emergency pod that we did

13:51

earlier this week and how cheaply they

13:53

were trained. But I think the other

13:55

surprise is that they were released as

13:58

open source software because one thing that

14:00

you can do with open source software

14:02

is download it, host it in another

14:05

country, remove some of the guardrails and

14:07

the censorship filters that might have been

14:09

part of the original model. But by

14:12

the way, it turned out there weren't

14:14

even really guardrails on the on the

14:16

V3 model, right? That it had not

14:18

been trained to avoid questions about Tiananmen

14:21

Square or anything. So that was another

14:23

really unusual thing about this. Right. And

14:25

one thing that we know about Chinese

14:28

technology products is that they don't tend

14:30

to be released that way. They tend

14:32

to be hosted in China and overseen,

14:35

and overseen by Chinese teams who can

14:37

make sure that they're not out there

14:39

talking about Tiananmen Square. Is the open

14:42

source nature of what Deep Seek has

14:44

done here part of the reason that

14:46

you think there might be conflict looming

14:49

between them and the Chinese government? You

14:51

know, honestly, I think this whole ask

14:53

it about Tiananmen stuff is a bit

14:55

of a red herring on a few

14:58

dimensions. So first, one of these like...

15:00

arguments that there's a little sort of

15:02

confusing to me is like folks used

15:05

to say oh like the Chinese models

15:07

are going to be lobotomized and like

15:09

they will never be as smart as

15:12

the Western ones because like they have

15:14

to be politically correct. I mean look

15:16

if you ask Claude to say racist

15:19

things it won't and Claude's still pretty

15:21

smart. Like this is sort of a

15:23

solved problem in a bit of a

15:26

red herring when talking about sort of

15:28

long-term competitiveness of Chinese and Western models.

15:30

Now you asked me like oh so

15:32

they released this this model globally globally.

15:35

and it's open source, maybe someone in

15:37

the Chinese government would be uncomfortable with

15:39

the fact that people can get a

15:42

Chinese model to say things that would

15:44

get you thrown in jail if you

15:46

posted them online in China. It's going

15:49

to be a really interesting calculus for

15:51

the Chinese government to make because on

15:53

the one hand, this is the most

15:56

positive shine that Chinese AI has got

15:58

globally in the history of Chinese AI.

16:00

have to navigate this and it might

16:03

prompt some uncomfortable conversations and bring regulators

16:05

to a place they wouldn't have

16:07

otherwise landed. Yeah. Jordan, I want to

16:09

ask you about something that people have

16:11

been talking about and speculating about in

16:14

relationship to the Deep Seek news for

16:16

the last week or so, which is

16:18

about chip controls. So we've talked a

16:21

little bit on the show earlier this

16:23

week about how Deep Seek managed to

16:25

put together these models. using some of

16:28

these kind of second-rate chips from invidia

16:30

that are allowed to be exported to

16:32

China. We've also talked about the fact

16:34

that you cannot get the most powerful

16:37

chips legally if you are a

16:39

Chinese tech company. So there have

16:41

been some people, including Elon Musk

16:43

and other American tech luminaries, who

16:45

have said, oh, well, Deep Seek

16:47

has this sort of secret stash

16:50

of these banned chips that they

16:52

have smuggled into the country, and

16:54

that actually they are not making

16:56

due with kind of the Kirkland

16:58

signature chips that they say they

17:00

are. What do we know about

17:02

how true that is? So, did

17:05

Deep Seek have band ships? It's

17:07

kind of impossible to know. This is

17:09

a question more for the US intelligence

17:11

community than like Jordan Schneider on Twitter.

17:13

But I do think that it is

17:15

important to understand that the delta between

17:17

what you can get in the West

17:19

and what you can get in China

17:22

is actually not that big. And we're

17:24

talking about training a lot, but also

17:26

on the inference side, China can still

17:28

buy this H20 chip from a video,

17:30

which is. basically world-class at like

17:32

deploying the AI and letting everyone

17:34

use it. So does this mean

17:36

that we should just give up?

17:38

I don't think so. Compute is

17:40

going to be a core input

17:42

regardless of how much model distillation

17:44

you're going to have in the

17:46

future. There have been a lot

17:49

of quotes even from the deep-sea

17:51

founder basically saying like the one

17:53

thing that's holding us back are these

17:55

export controls. Right. Okay, I want

17:57

to ask a big-picture question. Sure.

18:00

that a reason that people have

18:02

been so fascinated by this deep-seek

18:04

story is that at least for

18:06

some folks it seems to change

18:09

our understanding of where China is

18:11

in relation to the United States

18:13

when it comes to developing very

18:15

powerful AI. Jordan what is your

18:18

assessment of what the V3 and

18:20

R1 models mean and to what

18:22

extent do you think the game

18:24

has actually changed here? I'm

18:27

not really sure the game has

18:29

changed so much. Like Chinese engineers

18:31

are really good. I think it

18:33

is a reasonable base case that...

18:35

Chinese firms will be able to

18:37

develop comparable or fast follow on

18:39

the model side. But the real

18:42

sort of long-term competition is not

18:44

just going to be on developing

18:46

the models, but deploying them and

18:48

deploying them at scale. And that's

18:50

really where compute comes in, and

18:52

that's why expert controls are going

18:54

to continue to be a really

18:56

important piece of America's strategic arsenal

18:59

when it comes to making sure

19:01

that the 21st century is defined

19:03

by the US and our friends

19:05

as opposed to China and theirs.

19:07

Right. So it's one thing to

19:09

have a model that is about

19:11

as capable as the models that

19:13

we have here in the United

19:16

States. It's another thing to have

19:18

the energy to actually let everyone

19:20

use them as much as they

19:22

want to use them. What you're

19:24

saying is no matter what Deep

19:26

Seek may have invented here, that

19:28

fundamental dynamic has not changed. China

19:30

simply does not have nearly the

19:33

amount of compute that the United

19:35

States has. as long as we

19:37

don't screw up export controls. So

19:39

I think the sort of base

19:41

case for me is that if

19:43

the US stays serious about holding

19:45

a line on semiconductor manufacturing equipment

19:47

and export of AI chips, then

19:50

it will be incredibly difficult for

19:52

the Chinese sort of broader semiconductor

19:54

and AI ecosystem to leap ahead

19:56

much less kind of like fast

19:58

follow beyond being able to develop

20:00

comparable models. I'm feeling good as

20:02

long as you know, Trump doesn't

20:05

make some like crazy for soybeans

20:07

in exchange for ASML EUV machines.

20:09

That would really break my heart.

20:11

I want to inject kind of

20:13

a note of skepticism here because

20:15

I buy everything that you're saying

20:17

about how Deep Seek's progress has

20:19

been sort of bottlenecked by the

20:22

fact that it can't get these

20:24

very powerful American AI chips from

20:26

companies like invidia. But I also

20:28

am hearing. People who I trust

20:30

say things that make me think

20:32

that actually the bottleneck may not

20:34

be the availability of chips that

20:36

maybe with some of these algorithmic

20:39

efficiency breakthroughs that Deep Seek and

20:41

others have been making, it might

20:43

be possible to run a very

20:45

very powerful AI model on a

20:47

conventional piece of hardware, on a

20:49

Mac book even. And I wonder

20:51

about... How much of this is

20:53

just like AI companies in the

20:56

West trying to cope, trying to

20:58

make themselves feel better, trying to

21:00

reassure the market that they are

21:02

still going to make money by

21:04

investing billions and billions of dollars

21:06

into building powerful AI systems? If

21:08

these models do just become sort

21:10

of lightweight commodities that you can

21:13

run on a much less powerful

21:15

cluster of computers or maybe on

21:17

one computer, doesn't that just mean

21:19

we can't control the proliferation of

21:21

them at all? Yeah, I mean,

21:23

I think this is like this

21:25

is one potential future and maybe

21:27

that potential future like went up

21:30

10 percentage points of likelihood of

21:32

like you being able to fit

21:34

the biggest badest smartest most fast

21:36

efficient AI model on something that

21:38

you that can sit in your

21:40

home but I think there are

21:42

lots of other futures in which

21:44

sort of the world doesn't necessarily

21:47

play out that way and look

21:49

in video went down 15% it

21:51

didn't go it didn't go down

21:53

95% like I think if we're

21:55

really in that world where chips

21:57

don't matter because everything can be

21:59

shrunk down to kind of consumer

22:02

grade hardware, then the sort of

22:04

reaction that I think you would

22:06

have seen in the stock market

22:08

would have been even more dramatic

22:10

than the kind of freak out

22:12

we saw over this week. So

22:14

we'll see. I mean, it would

22:16

be a really remarkable kind of

22:19

democratizing thing if that was the

22:21

future we ended up living in,

22:23

but it still seems pretty unlikely

22:25

to my, you know, like history

22:27

major brain here. I would also

22:29

just point out, Kevin, that when

22:31

you look at what Deep Seek

22:33

has done, they have created a

22:36

really efficient version of a model

22:38

that American companies themselves had trained

22:40

like nine to 12 months ago.

22:42

So they sort of caught up

22:44

very quickly. And there are fascinating

22:46

technological innovations in what they did.

22:48

But in my mind, these are

22:50

still primarily optimization. Like for me,

22:53

what would tip me over into

22:55

like, oh my gosh, America is

22:57

losing this race is China is

22:59

the first one out of the

23:01

gate with a virtual co-worker, right?

23:03

Or like it's like a truly

23:05

phenomenal agent. Some sort of leap

23:07

forward in the technology as opposed

23:10

to we've caught up really quickly

23:12

and we've figured out something more

23:14

efficiently. Are you saying it differently

23:16

than that? I mean, I guess

23:18

I just don't know what like

23:20

a six-month lag would buy us

23:22

if it does take six months

23:24

for the Chinese AI companies like

23:27

Deep Seek to sort of catch

23:29

up to the state of the

23:31

art. You know, I was struck

23:33

by Adari Ahmedé, who's the CEO

23:35

of Anthropic, wrote an essay just

23:37

today about Deep Seek and export

23:39

controls, and in it he makes

23:41

this point about the sort of

23:44

difference between living in what he

23:46

called a unipolar world where one

23:48

country or one block of countries

23:50

has access to something like an

23:52

AGI or an ASI, and the

23:54

rest of the world doesn't. versus

23:56

the situation where China gets there

23:58

roughly around the same time. that

24:01

we do, and so we have

24:03

this bipolar world where two blocks

24:05

of countries, the East and the

24:07

West, basically have access to this

24:09

equivalent technology. And so- And of

24:11

course in a bipolar world, sometimes

24:13

we're very happy and sometimes we're

24:16

very sad. Exactly. So I just

24:18

think like, whether we get there,

24:20

you know, six months ahead of

24:22

them or not, I just feel

24:24

like there isn't that much of

24:26

a material difference. But Jordan, maybe

24:28

I'm wrong, can you make the

24:30

other side of that it really

24:33

does matter? I'm kind of there.

24:35

I, you know, I'll take a

24:37

little bit of issue with what

24:39

Darrio says. And I think, you

24:41

know, what one of the lessons

24:43

that Deep Sea shows is we

24:45

should expect a base case of

24:47

Chinese model makers being able to

24:50

fast follow the innovations, which by

24:52

the way, Casey actually do take

24:54

those giant data centers to run

24:56

all the experiments in order to

24:58

find out, you know, what is

25:00

this sort of future direction you

25:02

want to take your model? And

25:04

what sort of AI is going

25:07

to come down to is not

25:09

just creating the model. not just

25:11

sort of like Dario envisioning the

25:13

future and then all of a

25:15

sudden like like things happen. Like

25:17

there's going to be a lot

25:19

of messiness in the implementation and

25:21

there are going to be sort

25:24

of like teachers unions who are

25:26

upset that AI comes in the

25:28

classroom and there are going to

25:30

be so like all these regulatory

25:32

pushbacks and a lot of societal

25:34

reorganization which is going to need

25:36

to happen just like it did

25:38

during the industrial revolution. So look

25:41

model making is a frontier of

25:43

competition. also this broader like how

25:45

will a society kind of adopt

25:47

and cope with all of this

25:49

new future that's going to be

25:51

thrown in our faces over the

25:53

coming years and I really think

25:55

it's that just as much as

25:58

the model development and the compute

26:00

which is going to determine which

26:02

countries are going to gain the

26:04

most from what AI is going

26:06

to offer us. Yeah well Jordan

26:08

Thank you so much for joining

26:10

and explaining all of this to

26:13

us. I feel more enlightened. Me

26:15

too. Oh, my pleasure. My chain

26:17

of thought has just gotten a

26:19

lot longer. That's an AI joke.

26:21

Let me cut back. Kevin, there's

26:23

an agent at our door. Is

26:25

it Jerry McGuire? No, it's an

26:27

AI one. Oh, okay. Jerry the

26:30

choir! I don't know! operator,

26:34

information, give me Jesus on the line.

26:36

Do you know that one? No. Do

26:39

you know operator by Jim Crocey? No.

26:41

operator, oh, won't you help me face

26:43

this call? Well, Casey, call your agent,

26:45

because today we're talking about AI agents.

26:48

Why do I need to call my

26:50

agent? I don't know, it just sounded

26:52

good. Okay, well, I appreciate the effort,

26:55

but yes, Kevin, because... For months now,

26:57

the big AI labs have been telling

26:59

us that they are going to release

27:02

agents. This year, agents of course, being

27:04

software that can essentially use your computer

27:06

on your behalf or use a computer

27:08

on your behalf. And the dream is

27:11

that you have sort of a perfect

27:13

virtual assistant or co-worker. You name it.

27:15

If they are somebody who might work

27:18

with you at your job, the AI

27:20

labs are saying, we are building that

27:22

for you. Yeah, so last year toward

27:25

the end of the year we started

27:27

to see kind of these demos, these

27:29

these previews that companies like Anthropic and

27:31

Google were working on. Anthropic released something

27:34

called computer use, which was an AI

27:36

agent, a sort of very early preview

27:38

of that. And then Google had something

27:41

called Project Mariner that I got a

27:43

demo of, I believe in December, that

27:45

was basically the same thing, but their

27:48

version of it. And then just last

27:50

week, Open AI announced that it was

27:52

launching. which is it's for version of

27:55

an AI agent and unlike anthropic and

27:57

Googles which you know you either had

27:59

to be a developer or part of

28:01

some early testing program to access you

28:04

and I could try it for ourselves

28:06

by just upgrading to the $200 a

28:08

month pro subscription of chat GPT. Yeah,

28:11

and I will say that as somebody

28:13

who's willing to spend money on software

28:15

all the time, I thought, am I

28:18

really about to spend $200 to do

28:20

this? But, you know, in the name

28:22

of science, Kevin, I had to. At

28:24

this point, I am spending more on

28:27

AI subscription products than on my mortgage.

28:29

I'm pretty sure that's correct. You know,

28:31

it's worth it. We do it for

28:34

journalism. We do. So we both spent

28:36

a couple of days putting operator through

28:38

its paces, and today we want to

28:41

talk a little bit about what we

28:43

found. Yeah, so would you just explain

28:45

like what? operator is and how it

28:47

works. Yeah, sure. So operator is a

28:50

separate sub domain of chat GPT. You

28:52

know, sometimes the chat GPT will just

28:54

let you pick a new model from

28:57

a drop-down menu. But for operator, you

28:59

got to go to a dedicated site.

29:01

Once you do, you'll see a very

29:04

familiar chatbot interface, but you'll see different

29:06

kinds of suggestions that reflect some of

29:08

the partnerships that open AI has struck

29:11

up. So for example, they have partnerships

29:13

with open table and stub hub hub

29:15

and all recipes. meant to give you

29:17

an idea of what operator can do.

29:20

And frankly Kevin, not a lot of

29:22

this sounds that interesting, right? Like the

29:24

suggestions are on the the order of

29:27

suggest a 30-minute meal with chicken or

29:29

reserve a table for eight or find

29:31

the most affordable passes to the Miami

29:34

Grand Prix. Again, so far, kind of

29:36

so boring. What is... different about operator,

29:38

though, is that when you say, okay,

29:40

find the most affordable passes to the

29:43

Miami Grand Prix, when you hit the

29:45

enter button, it is going to open

29:47

up its own web browser and it's

29:50

going to use this new model that

29:52

they have developed to try to actually

29:54

go and get those passes for you.

29:57

Yeah, so this is an important thing

29:59

because I think, you know, When people

30:01

first heard about this, they thought, okay,

30:04

this is an AI that kind of

30:06

takes over your computer, takes over your

30:08

web browser, that is not what operator

30:10

does. Instead, it opens a new browser

30:13

inside your browser, and that browser is

30:15

hosted on open AI servers. It doesn't

30:17

have your bookmarks and stuff like that saved,

30:19

but you can take it over from. the

30:21

autonomous AI agent if you need to click

30:23

around or do something on it. But it

30:26

basically exists. It's like a it's a browser

30:28

within a browser. Yeah. So the one of

30:30

the ideas on operator is that you should

30:32

be able to leave it on supervisor just

30:34

kind of go do your work while it

30:36

works. But of course it is very fun

30:38

initially at least to watch the computer try

30:40

to use itself. And so I sat there

30:42

in front of this browser within a browser

30:45

and I watched this computer move a mouse

30:47

around. type the, you know, URL,

30:49

navigate to a website, and, you

30:51

know, in the example I just

30:53

gave, actually search for passes to

30:55

the Miami Grand Prix. Yeah, and

30:57

it's interesting on a slightly more

30:59

technical level because until now, if

31:01

an AI... system like a chat

31:03

GPT wanted to interact with some

31:06

other website, it had to do

31:08

so through an API, right? APIs,

31:10

application program interfaces are sort of

31:12

the way that computers talk to

31:14

each other, but what operator does

31:16

is essentially eliminate the need for

31:18

APIs because it can. just click

31:20

around on a normal website that

31:22

is designed for humans and behave

31:25

like a human and you don't

31:27

need a special interface to do

31:29

that. Yeah, and now some people

31:31

might hear that, Kevin, and start

31:33

screaming because what they will say

31:35

is APIs are so much more

31:37

efficient than their operator is doing

31:40

here. APIs are doing here. APIs

31:42

are very structured. They're very fast.

31:44

They let computers talk to each

31:46

other without having to, for example, open

31:48

up a browser. have to be built. There

31:50

is a finite number of them. The reason

31:52

that Open AI is going through this exercise

31:55

is because they want a true general purpose

31:57

agent that can do anything for you whether

31:59

they're is an API for it or

32:01

not. And maybe we should just pause

32:03

for a minute there and zoom out

32:05

a little bit to say, why are

32:08

they building? That's like, what is the

32:10

long-term vision here? Sure. So the vision

32:12

is to create virtual coworkers. Kevin, this

32:14

is the North Star for the big

32:16

AI labs right now. Many have them

32:18

have said that they are trying to

32:20

create some kind of digital entity that

32:23

you can just hire as a co-worker.

32:25

The first ones, they'll probably be engineers

32:27

because these systems are already so good

32:29

at right so good at right. code,

32:31

but eventually they want to create virtual.

32:33

consultants virtual lawyers virtual doctors you name

32:35

it virtual podcast hosts let's hope they

32:37

don't go that far but everything else

32:40

is on the table and if they

32:42

can get there you know presumably that

32:44

there are going to be huge profits

32:46

in it for them they're going to

32:48

potentially be huge productivity gains for companies

32:50

and then there's of course the question

32:52

of well what does this mean for

32:55

human beings and I think that's somewhat

32:57

murkier right and I think there's also

32:59

it also helps to justify the cost

33:01

of running these things because $200 dollars

33:03

a is a lot to pay for

33:05

a version of ChatGBT, but it's not

33:07

a lot to pay for a remote

33:09

worker. And if you could, say, use

33:12

the next version of operator, or maybe

33:14

two or three versions from now, to

33:16

say, replace a customer service agent or

33:18

someone in your billing department, that actually

33:20

starts to look like a very good

33:22

deal. Absolutely. Or even if I could

33:24

bring it into the realm of journalism,

33:27

Kevin, if I had a virtual research

33:29

assistant and I said, hey, I'm going

33:31

to write about this today, information about

33:33

this from the past couple of years

33:35

and maybe organize it in such a

33:37

way that I might you know write

33:39

a column based off of it like

33:41

yeah that's absolutely worth $200 a month

33:44

to me. Okay so Casey walk me

33:46

through something that you actually asked operator

33:48

to do for you and what it

33:50

did autonomously on its own. Sure I'll

33:52

maybe give like two examples like a

33:54

pretty good one and maybe a not

33:56

so good one. Pretty good one was

33:59

and this was it actually suggested by

34:01

operator. trip advisor to look up walking

34:03

tours in London that I might want

34:05

to do the next time I'm in

34:07

London. When I did that. When are

34:09

you going to London? I'm not actually

34:11

going to London. Oh so you lied

34:14

to the AI? And not for the

34:16

first time. But here's what I'll say

34:18

if anybody wants to break heaven at

34:20

a London, get in touch. We love

34:22

the city. Yep. So I said okay

34:24

operator sure let's do. Let's find me

34:26

some walking tours. I clicked that it

34:28

opened. It opened open a browser. It

34:31

went to TripAdvisor, it searched for Luden

34:33

Walking Tours, it read the information on

34:35

the website, and then it presented it

34:37

to me, did that within a couple

34:39

of minutes. Now, on one hand, could

34:41

I have done that just as easily

34:43

by Google? Could I probably have done

34:46

it even faster if I'd done it

34:48

myself? Sure. But if you're just sort

34:50

of interested in the technical feat that

34:52

is getting one of these models to

34:54

open a browser to navigate to a

34:56

website, navigate to a website, read it

34:58

and share information, computer using itself and

35:00

you know going around like typing things

35:03

and selecting things from drop-down menus yeah

35:05

it's sort of like you know if

35:07

you think it is cool to be

35:09

in a self-driving car like this is

35:11

that but for your web a self-driving

35:13

browser it is a self-driving browser so

35:15

that's the good example yes what was

35:18

another example so another example and this

35:20

was something else that open AI suggested

35:22

that we try was to try to

35:24

use operator to buy groceries and they

35:26

have a partnership with instakart And so

35:28

I thought, okay, they're gonna have like

35:30

sort of dialed this in so that

35:32

there's a pretty good experience. And so

35:35

I said, okay, let's go ahead and

35:37

buy groceries and I went to operator

35:39

and I said something like, hey, can

35:41

you help me buy groceries on Instagram?

35:43

And it said, sure. And here's what

35:45

it did. It opened up, Instagram, in

35:47

a browser, so far, so good. And

35:50

then it started searching for milk in

35:52

stores located in Des Moines, Iowa. Now,

35:54

you do not live in Des Moines,

35:56

Iowa, so why did it think that

35:58

you did? As best as I can

36:00

tell, the reason it did this is

36:02

that in... Instagram defaultsts to searching for

36:04

grocery stores in the local area and

36:07

the server that this instance of operator

36:09

was running on was in Iowa. Now,

36:11

if you were designing a grocery product

36:13

like Instagram, and Instagram does this, when

36:15

you first sign on and say you're

36:17

looking for groceries, it will say quite

36:19

sensibly, where are you? operator does not

36:22

do this. Instagram might also offer suggestions

36:24

for things that you might want to

36:26

buy. It does not just assume that

36:28

you want milk. Wow, I'm just picturing

36:30

like a house in Des Moines Iowa

36:32

where there's just like a palette of

36:34

milk being delivered every day from all

36:36

these poor operator users. Yes. So I

36:39

thought, okay, whatever, you know, this thing

36:41

makes mistakes. Let's, let's hope that it

36:43

gets on the right track here. And

36:45

so I tried to pick the grocery

36:47

store that I wanted it to shop

36:49

at, which is, you know, in San

36:51

Francisco where I live, and it entered

36:54

that grocery store's address as the delivery

36:56

address. So like it would try to

36:58

deliver groceries presumably from Des Moines Iowa

37:00

to my grocery store, which is not

37:02

what I wanted. And it actually could

37:04

not. solve this problem without my help.

37:06

I had to take over the browser,

37:08

log into my Instacart account, and tell

37:11

it which grocery store that I wanted

37:13

to shop it. So already, all of

37:15

this has taken at least 10 times

37:17

as long as it would have taken

37:19

me to do this myself. Yeah, so

37:21

I had some similar experiences. The first

37:23

thing that I had operator tried to

37:26

do for me was to buy a

37:28

domain name and set up a web

37:30

server for a project that you and

37:32

I are working on that we can't

37:34

really talk about yet. Secret project. Secret

37:36

project. And so I said to operator,

37:38

I said, go research available domain names

37:41

related to this project, buy the one

37:43

that costs less than $50. And then

37:45

by hosting it. and set it up

37:47

and configure all the DNS settings and

37:49

stuff like that. Okay, so that's like

37:51

a true multi-step project and something that

37:53

would have been legitimately very annoying to

37:55

do yourself. Yeah. That would have taken

37:58

me, I don't know, half an hour

38:00

to do on my own, and it

38:02

did take operator some time. I had

38:04

to kind of like set it and

38:06

forget it, and like I got myself

38:08

a snack and a cup of coffee,

38:10

and then when I came back, it

38:13

had done. Most of these tasks really

38:15

yes, I had to still do things

38:17

like take over the browser and enter

38:19

my credit card number I had to

38:21

give it some details about like my

38:23

address for the sort of Registration for

38:25

the domain name I had to pick

38:27

between the various hosting plans that were

38:30

available on this website, but It did

38:32

90% of the work for me. And

38:34

I just had to sort of take

38:36

over and do the last mile. And

38:38

this is really interesting because what I

38:40

would assume was it would get like,

38:42

I don't know, 5% of the way

38:45

and it would hit some hicup and

38:47

it just wouldn't be able to figure

38:49

something out until you came back and

38:51

saved it. But it sounds like from

38:53

what you're saying was, it was somehow

38:55

able to work around whatever unanswered questions

38:57

there were and still get a lot

38:59

done while you weren't paying attention. It

39:02

felt a little bit like training like

39:04

a very new very insecure intern because

39:06

like it at first it would keep

39:08

prompt me be like well do you

39:10

want a.com or a dot net? And

39:12

eventually you just have to prompt it

39:14

and say, like, make whatever decisions you

39:17

want. Like, wait, you said that to

39:19

it. Yes, I said, like, only ask

39:21

for my intervention if you can't progress

39:23

any farther, otherwise just make the most

39:25

reasonable decision. You said, I don't care

39:27

how many people you have to kill.

39:29

Just get me this domain. And it

39:31

said, understood, sir. Yeah, and I'm now

39:34

wants it in 42 states. Anyway, that

39:36

was one thing that operator did for

39:38

me that was pretty impressive. That feels

39:40

like a grand success compared to what

39:42

I got operator to do. Yeah, it

39:44

was pretty impressive. I also had to

39:46

send lunch to one of my coworkers,

39:49

Mike Isaac, who was hungry, because he

39:51

was on deadline, and I said go

39:53

to DoorDash and get Mike some lunch.

39:55

It did initially mess up that process.

39:57

because it decided to send him tacos

39:59

from a taco place, which is great.

40:01

And it's a taco place, I know

40:03

it's very good. But I said, order

40:06

enough for two people and sort of

40:08

ordered two tacos. And this is one

40:10

of those places where the tacos are

40:12

quite small. Operator said, get your portion

40:14

size under usual. America. Yeah, so then

40:16

I had to go in and say,

40:18

does that sound like enough food operator?

40:21

And it said, actually, now that you

40:23

mentioned it, I should probably order more.

40:25

Wait, no, so here's a question. So

40:27

in these cases, it is the first

40:29

step that you log into your account, because

40:31

it doesn't have any of your payment details

40:33

or anything. So at what point are you

40:36

actually sort of teaching at that? It depends

40:38

on the website, so sometimes you can just

40:40

say. up front like here is my email

40:42

address or here is my login information

40:44

and it will sort of you know

40:46

log you in and do all that.

40:48

Sometimes you take over the browser. There

40:50

are some privacy features that are probably

40:52

important to people where it says open

40:54

AI says that it does not take

40:56

screenshots of the browser while you are

40:58

in control of it because you might

41:00

not want your credit card information getting

41:02

sent to open AI servers or anything like

41:05

that. Sometimes it happens at the beginning of

41:07

the process, sometimes it happens like when you're

41:09

checking out at the end. And so were

41:11

you taking it over to log in or

41:14

were you saying, I don't care, and you

41:16

just like were giving operator your door dash

41:18

password and play text? I was taking it

41:20

over. Okay, smart. Yeah. So. Those were the

41:23

good things I also this was a fun

41:25

one. I I wanted to see if operator

41:27

could make me some money So I said

41:29

go take a bunch of online surveys because

41:31

you know there are all these websites where

41:33

you can like get a couple cents for

41:35

like filling out an online survey Something that

41:37

most people don't know about Kevin is he

41:40

devotes 10% of his brain at any given

41:42

time to thinking about schemes to generate and

41:44

it's one of my favorite aspects of your

41:46

personality that I feel like doesn't get exposed

41:48

very much. But this is truly the most

41:50

rusian approach to using operator, I can imagine.

41:52

So I can't wait to find out how this went.

41:54

Well, the most rusian approach might have been what I

41:56

tried just before this, which was to have it go

41:58

play online poker for me. But it did

42:01

not do it. It said I

42:03

can't help with gambling or lottery

42:05

related activities. Okay, Woke AI. Does

42:07

the Trump administration know about this?

42:09

But it was able to actually

42:11

fill out some online surveys for

42:13

me and it earned a dollar

42:15

and 20 cents. Is that right?

42:17

Yeah, in about 45 minutes. So

42:19

if you had it going all

42:21

month, presumably you could maybe eke

42:23

out the $200 to cover the

42:25

cost of operator pro? Yes, and

42:27

I'm sure I spent hundreds of

42:29

dollars worth of GPU computing power

42:31

just to be able to make

42:33

that dollar and 20 cents. But

42:35

hey, it worked. So those were

42:37

some of the things that I

42:39

tried. There were some other things

42:41

that it just. would not do

42:43

for me no matter how hard

42:45

I tried one of them so

42:47

one of them was to I

42:49

was trying to update my website

42:51

and put some links to articles

42:53

that I'd written on my website

42:55

and what I found after trying

42:57

to do this was that there

42:59

are just websites where operator is

43:01

not allowed to go. And so

43:03

when I said to operator, go

43:05

pull down these New York Times

43:07

articles that I wrote and, you

43:09

know, put them onto my website,

43:11

it said, I can't get to

43:13

the New York Times website. I'm

43:15

going to guess you expected that

43:17

to happen. Well, I thought maybe

43:19

it has some clever work around

43:21

and maybe I should alert the

43:23

lawyers at the New York Times,

43:26

if that's the case. But no,

43:28

I assumed that if any website

43:30

were to be blocking the open

43:32

AI web crawlers, it would be

43:34

the New York Times. There are

43:36

other websites that have also put

43:38

up similar blockades to prevent operator

43:40

from crawling them, read it, you

43:42

cannot go on to with operator,

43:44

YouTube, you cannot go on to

43:46

with operator, various other websites, GoDaddy

43:48

for some reason did not allow

43:50

me to use operator to buy

43:52

a domain name there, so I

43:54

had to use another domain name

43:56

site to do that. So right

43:58

now there are some pretty j-

44:00

parts of operator, I would not

44:02

say that most people would get

44:04

a lot of value from using

44:06

it, but what do you think?

44:08

Well... I do think that

44:10

there is something just undeniably cool

44:12

about watching a computer use itself.

44:14

Of course, it can also be

44:17

quite unsettling. A computer that can

44:19

use itself can cause a lot

44:21

of harm. But I also think

44:23

that it can do a lot

44:25

of good. And so it was

44:27

fun to try to explore what

44:29

some of those things could be.

44:31

And to the extent that operator

44:33

is pretty bad at a lot

44:35

of tasks today, I would point

44:37

out that it showed pretty impressive

44:39

gains on some benchmark. So there

44:41

is one. benchmark for example that

44:43

anthropic used when they unveiled computer

44:46

use last year and they scored

44:48

14.9% on something called OS world

44:50

which is an evaluation for testing

44:52

agent so not great. Just three

44:54

months later, Open AI said that

44:56

its Kua model scored 38.1% on

44:58

the same evaluation. And of course,

45:00

we see this all the time

45:02

in AI where there's just this

45:04

very rapid progress on these benchmarks.

45:06

And so on one hand, 38.1%

45:08

is a failing grade on basically

45:10

any test. On the other hand,

45:12

if it improves at the same

45:15

rate over the next three to

45:17

six months, you're going to have

45:19

a computer that is very good

45:21

at using itself, right? So that

45:23

I just think is worth noting.

45:25

Yes, I think that's plausible. We've

45:27

obviously seen a lot of different

45:29

AI products over the last couple

45:31

of years start out being pretty

45:33

mediocre and get pretty good within

45:35

a matter of months. But I

45:37

would give one cautionary note here.

45:39

And this is actually the reason

45:41

that I'm not particularly bullish about

45:44

these kind of browser using AI

45:46

agents. I don't think the internet

45:48

is going to sit still and

45:50

allow this to happen. The internet

45:52

is built for humans to use,

45:54

right? It is every news publisher.

45:56

that shows ads on their website,

45:58

for example, prices those ads based

46:00

on the expectation that humans are

46:02

actually looking at them. But if

46:04

browser agents start to become more

46:06

popular and all of a sudden

46:08

10 or 20 or 30% of

46:10

the visitors to your website are

46:13

not actually humans, but are instead

46:15

operator or some similar system, I

46:17

think that starts to break the.

46:19

assumptions that power the economic model

46:21

of a lot of the internet.

46:23

Now is that still true if

46:25

we find that the agents actually

46:27

get persuaded by the ads and

46:29

that if you send operator to

46:31

buy door dash and it sees

46:33

an ad for McDonald's it's like

46:35

you know what that's a great

46:37

idea I'm gonna ask Kevin if

46:39

he actually wants some of that.

46:42

Totally Totally, that's an I actually

46:44

think you're joking, but I actually

46:46

think that is a serious possibility

46:48

here is that people who, you

46:50

know, build e-commerce sites, Amazon, etc.

46:52

start to put in basically signals

46:54

and messages for browser agents to

46:56

look at on their website to

46:58

try to influence what it ends

47:00

up buying. And I think you

47:02

may start to see restaurants popping

47:04

up in certain cities with names

47:06

like operator, pick me or order

47:08

from this one, Mr. That's maybe

47:11

a little extreme, but I do

47:13

think that there's going to be

47:15

a backlash among websites publishers e-commerce

47:17

vendors as these agents start to

47:19

take off. I think that that

47:21

is reasonable. I'll tell you what

47:23

I've been thinking about is how

47:25

do we turn this tech demo

47:27

into a real product? And the

47:29

main thing that I noticed when

47:31

I was testing operator was there

47:33

is a difference between an agent

47:35

that is using a browser and

47:37

an agent that is using your

47:40

browser. When an agent is able

47:42

to use your browser, which it

47:44

can't right now, it's already logged

47:46

into everything. faster and more seamlessly

47:48

and without as much hand-holding. Of

47:50

course, there are also so many

47:52

more privacy and security risks that

47:54

would come from entrusting an agent

47:56

with that kind of information. So

47:58

there is some sort of chasm

48:00

there that needs to be closed

48:02

and I'm not quite sure how

48:04

anyone does it, but I will

48:06

tell you I do not think

48:08

the future is opening up these

48:11

virtual browsers and me having to

48:13

enter all of my login and

48:15

payment details every single time I

48:17

want to do anything on the

48:19

internet because truly I would rather

48:21

just do it myself. Right. I

48:23

also think there's just a lot

48:25

more potential for harm here. A

48:27

lot of AI safety experts I've

48:29

talked to are very worried about

48:31

this because What you're essentially doing

48:33

is letting the AI models make

48:35

their own decisions and actually carry

48:37

out tasks. And so you can

48:40

imagine a world where an AI

48:42

agent that's very powerful, a couple

48:44

versions from now, decides to start

48:46

doing cyber attacks because maybe some

48:48

malevolent user has told it to

48:50

make money and it decides that

48:52

the best way to do that

48:54

is by hacking into people's crypto

48:56

wallets and stealing their crypto. Yeah.

48:58

Those are the kinds of reasons

49:00

that I am a little more

49:02

skeptical that this represents a big

49:04

breakthrough But I I think it's

49:06

really interesting and it did give

49:09

me that feeling of like wow

49:11

this could get really good really

49:13

fast And if it does the

49:15

world will look very different Where

49:17

we come back? Kevin back that

49:19

caboose up. It's time for the

49:21

Hot Mess Express. You know, Roos

49:23

Caboose was my nickname in middle

49:25

school. Kevin Caboose. Choo-choo! Well,

49:43

Casey, we're here wearing our trained

49:45

conductor hats, and my child's train

49:47

set is on the table in

49:49

front of us, which can only

49:51

mean one thing. We're going to

49:54

train a large language model. Nope,

49:56

that's not what that means. It

49:58

means it's time to play a

50:00

game of the Hot Mess Express.

50:02

Paws for Theme Song. Hot mess

50:04

Express Kevin is our segment where

50:06

we run through some of the

50:09

messiest recent text stories and deploy

50:11

our official hot mess thermometer to

50:13

tell you just how messy we

50:15

think things have gotten and Kevin

50:17

you better sit down for this

50:19

one. So why don't we go

50:21

ahead? Fire up the hot mess

50:24

express and see what is the

50:26

first story yeah, I hear that

50:28

I hear a faint chug-a-chugga in

50:30

my headphones Oh, it's pulling into

50:32

the station Casey. What's the first

50:34

cargo that our hot mess express

50:37

is carrying? All right, Kevin, this

50:39

first story comes to us from

50:41

the New York Times, and it

50:43

says that Fable, a book app,

50:45

has made changes after some offensive

50:47

AI messages. Okay, see, have you

50:49

ever heard of Fable, the book

50:52

app? Well, not until this story,

50:54

Kevin, but I am told that

50:56

it is an app for sort

50:58

of keeping track of what you're

51:00

reading, not unlike a good reads,

51:02

but also for discussing what you're

51:04

reading, and apparently this app also

51:07

offers some AI chat. Yeah, you

51:09

can have AI sort of summarize

51:11

the things that you're reading in

51:13

a personalized way. And this story

51:15

said that in addition to spitting

51:17

out bigoted and racist language, the

51:19

AI inside Fable's book app had

51:22

told one reader who had just

51:24

finished three books by black authors,

51:26

quote, your journey dives deep into

51:28

the heart of black narratives and

51:30

transformative tales, leaving mainstream stories gasping

51:32

for air. Don't forget to surface

51:34

for the occasional white author, okay?

51:37

And another personalized AI summary that

51:39

Fable Produce told another reader that

51:41

their book choices were, quote, making

51:43

me wonder if you're ever in

51:45

the mood for a straight cis

51:47

white man's perspective. And if you

51:50

are interested in a straight cis

51:52

white man's perspective, follow Kevin Roos

51:54

on x.com. Now, Kevin, why do

51:56

we think this happened? I don't

51:58

know, Casey. This is a headscratcher.

52:00

for me. I mean, we know

52:02

that these apps can spit out

52:05

biased things that is just sort

52:07

of like part of how they

52:09

are trained and part of what

52:11

we know about them. I don't

52:13

know what model Fable was using

52:15

under the hood here, but yeah,

52:17

this seems not great. Well, it

52:20

seems like we've learned a lesson

52:22

that we've learned more than once

52:24

before, which is that large language

52:26

models are trained on the internet,

52:28

which contains near infinite racism, So

52:30

there are mitigations that you can

52:32

take against that, but it appears

52:34

that in this case, they were

52:36

not successful. Fable's head of community,

52:39

Kim Marsh Alley, has said that

52:41

all features using AI are being

52:43

removed from the app, and a

52:45

new app version is being submitted

52:47

to the app store. So you

52:49

always hate it when the first

52:51

time you hear about an app

52:53

is that they added AI, and

52:55

it made it super racist, and

52:57

they have to redo the app.

52:59

this poses any sort of competitive threat

53:01

to Grock which until this story was

53:04

the leading racist AI app on the

53:06

market? I do think so and I

53:08

have to admit that all the folks

53:10

over at Grock are breathing a sigh

53:12

of relief now that they have once

53:14

again claimed the mantle. All right Casey

53:16

how hot is this mess? Well Kevin

53:18

in my opinion if your AI is

53:21

so bad that you have to remove

53:23

it from the app completely that's a

53:25

hot mess. Yeah, I rate this one

53:27

a hot mess as well. All right,

53:29

next stop. Amazon pauses drone

53:31

deliveries after aircraft

53:33

crashed in rain. Casey,

53:35

this story comes to us

53:37

from Bloomberg, which had a different

53:40

line of reporting than we did

53:42

just a few weeks ago on

53:44

the show about Amazon's drone program

53:47

Prime Air. Casey, what happened to

53:49

Amazon Prime Air? If you heard

53:51

the episode of Heart Fork where

53:54

we talked about it, Amazon Prime

53:56

Air delivered us some Brazilian bumbum

53:58

cream and it... did so without

54:00

incident. However, Bloomberg reports that Amazon has

54:03

had to now pause all of their

54:05

commercial drone deliveries after two of its

54:07

latest models crashed in rainy weather at

54:10

a testing facility. And so the company

54:12

says it is immediately suspending drone deliveries

54:14

in Texas and Arizona and will now

54:16

fix the aircraft software. Kevin, how did

54:19

you react to this? Well, I think

54:21

it's good that there's suspending drone deliveries

54:23

before they fix the software because these

54:26

things are quite heavy, Casey. I would

54:28

not want one of them to fall

54:30

in my head. And I have to

54:32

tell you this story gave me the

54:35

worst kind of flashbacks because in 2016

54:37

I wrote about Facebook's drone Aquila and

54:39

its first what the company told me

54:42

had been its first successful test flight

54:44

in its mission to deliver internet around

54:46

the world via drone What the company

54:48

did not tell me when I was

54:51

interviewing its executives including Mark Zuckerberg was

54:53

that the plane had crashed after that

54:55

first flight and so I was a

54:57

small detail I'm sure it was an

55:00

innocent omission Yes, I'm sure. Well, it

55:02

was Bloomberg again, who reported, you know,

55:04

a couple months after I wrote this

55:07

story, that the Facebook drone had crashed.

55:09

I was, of course, hugely embarrassed and,

55:11

you know, wrote a bunch of stories

55:13

about this. But anyways, it really should

55:16

have occurred to me when we were

55:18

out there watching the Amazon drone, that

55:20

this thing was also probably secretly crashing,

55:23

and we just hadn't found out about

55:25

it yet. And indeed, we now learned

55:27

it. We have to ask them, now,

55:29

do this thing actually crash? I'm tired

55:32

of being burned. Now Casey, we should

55:34

say, according to Bloomberg, these drones reportedly

55:36

crash in December. We visited Arizona to

55:39

see them in very early December, so

55:41

most likely, you know, this all happened

55:43

after we saw them. But I think

55:45

it's a good idea to keep in

55:48

mind that as we're talking about these

55:50

new and experimental technologies. that many of

55:52

them are still having the kinks worked

55:55

out. All right Kevin, so let's get

55:57

out the thermometer. or how hot of

55:59

a mess is this? I would say

56:01

this is a moderate mess. Look, these

56:04

are still testing programs. No one was

56:06

hurt during these tests. I am glad

56:08

that Bloomberg reported on this. I'm glad

56:11

that they've suspended the deliveries. These things

56:13

could be quite dangerous flying through the

56:15

air. I do think it's one of

56:17

a string of reported. incidents with these

56:20

drones. So I think they've got some

56:22

quality control work ahead of them and

56:24

I hope they do well on it

56:27

because I want these things to exist

56:29

in the world and be safe for

56:31

people around them. All right. I will

56:33

agree with you and say that this

56:36

is a warm mess and hopefully you

56:38

can get straightened out over there. Let's

56:40

see what else is coming down the

56:43

tracks. Fitbit has agreed to pay $12

56:45

million for not quickly reporting burn risk

56:47

with watches. Kevin, do you hear about

56:49

this? I did. This was the fitbit.

56:52

Devices were like literally burning people. Yes,

56:54

from 2018 to March of 2022, Fitbit

56:56

received at least a hundred and seventy

56:59

four reports globally of the lithium ion

57:01

battery in the Fitbit ionic watch overheating,

57:03

leading to a hundred and eighteen reported

57:05

injuries, including two cases of third degree

57:08

burns and four of second degree burns.

57:10

That comes from the New York Times

57:12

Deal Hassan. Kevin, I thought these things

57:15

were just supposed to burn calories. Well,

57:17

it's like I always say, exercising is

57:19

very dangerous and you should never do

57:21

it. And this justifies my decision not

57:24

to wear a fit bit. To me,

57:26

the biggest surprise of this story was

57:28

that people were wearing fit bits from

57:31

March 2018 to 2022. I thought every

57:33

fitbit had been purchased by like 2011

57:35

and then put in a drawer never

57:37

to be heard again. So what is

57:40

going on with these sort of late

57:42

stage fitbit buyers? I'd love to find

57:44

out. But of course, we feel terrible

57:47

for everyone who was burned by a

57:49

fitbit and it's not going to be

57:51

the last time technology burns you. I

57:53

mean realistically. That's true. That's true. Now

57:56

what kind of mess is this? I

57:58

would say this is a hot mess.

58:00

This is an officially hot, literally hot.

58:03

They're hot. Here's my sort of rubric.

58:05

If technology physically burns you, it is

58:07

a hot mess. If you have physical

58:09

burns on your body, what other kind

58:12

of mess could it be? It's true.

58:14

That's a hot mess. Okay, next stop

58:16

on the Hot Mess Express. Google says

58:19

it will change Gulf of Mexico to

58:21

Gulf of America in Maps app after

58:23

government updates. Casey, have you been following

58:25

this story? I have, Kevin, every morning

58:28

when I wake up I scan America's

58:30

maps and I say, what has been

58:32

changed? And if so, has it been

58:34

changed for political reasons? And this was

58:37

probably one of the biggest examples of

58:39

that we've seen. Yeah, so this was

58:41

an interesting story that came out in

58:44

the past couple of days. days in

58:46

office and said that he was changing

58:48

the name of the Gulf of Mexico

58:50

to the Gulf of America and the

58:53

name of Denali, the mountain in Alaska,

58:55

to Mount McKinley, Google had to decide,

58:57

well, when you go on Google Maps

59:00

and look for those places, what should

59:02

I call them? It seems to be

59:04

saying that it is going to take

59:06

inspiration from the Trump administration and update

59:09

the names of these places in the

59:11

maps app. Yeah, and look, I don't

59:13

think Google really had a choice here.

59:16

We know that the company has been

59:18

on Donald Trump's bad side for a

59:20

while, and if it had simply refused

59:22

to make these changes, it would have

59:25

sort of caused a whole new controversy

59:27

for them. And it is true that

59:29

the company changes place names when governments

59:32

changed place names, right? Like Google Maps

59:34

existed when Mount McKinley was called Mount

59:36

McKinley, and President Obama changed it to

59:38

Janali, and Google updated the map. Now

59:41

it's changed back there doing... the same

59:43

thing. But now that we know how

59:45

compliant Google is Kevin, I think there's

59:48

room for Donald Trump to have a

59:50

lot of fun with the company. Yeah,

59:52

what can you do? Well, you could

59:54

call it the Gulf of Gemini isn't

59:57

very good. And just see what would

59:59

happen. Because they would kind of have

1:00:01

to just change it. Can you imagine

1:00:04

every time you opened up? Google Maps

1:00:06

and you looked at the Gulf of

1:00:08

Mexico slash America and just said the

1:00:10

Gulf of Gemini is not very good.

1:00:13

You know I hate to give Donald

1:00:15

Trump any ideas but I don't know.

1:00:17

So what kind of mess do you

1:00:20

think this is Kevin? I think this

1:00:22

is a mild mast. I think this

1:00:24

is a tempest in a teapot. I

1:00:26

think that this is the kind of

1:00:29

update that you know companies make all

1:00:31

the time because places change names all

1:00:33

the time let's just say it well

1:00:36

Kevin I guess I would say that

1:00:38

one is a hot mess because if

1:00:40

we're just gonna start renaming everything on

1:00:42

the map that's just gonna get extremely

1:00:45

confusing for me to follow I got

1:00:47

places to go you go to like

1:00:49

three places yeah and I use Google

1:00:52

Maps to get there and I need

1:00:54

them to be named the same thing

1:00:56

that they were yesterday I don't think

1:00:58

they're gonna change the name of Barry's

1:01:01

boot camp all right final stop on

1:01:03

the hot mess express Casey bring us

1:01:05

home. All right. Kevin, this is some

1:01:08

sad news. Another Waymo was vandalized. This

1:01:10

is from one-time hard-for guest Andrew J.

1:01:12

Hawkins at The Virgin. He reports that

1:01:14

this Waymo was vandalized during an illegal

1:01:17

street takeover near the Beverly Center in

1:01:19

LA. Video from Fox 11 shows a

1:01:21

crowd of people basically dismantling the driverless

1:01:24

car piece by piece and then using

1:01:26

the broken pieces to smash the windows.

1:01:28

Kevin, what did you make of this?

1:01:30

Well, Casey, as you recall, you predicted

1:01:33

that in 2025, Waymo would go mainstream,

1:01:35

and I think there's no better proof

1:01:37

that that is true than that people

1:01:40

are turning on the Waymo's and starting

1:01:42

to beat them up. Yeah, I, you

1:01:44

know, look, I don't... know that we

1:01:46

have heard any interviews from why these

1:01:49

people were doing this. I don't know

1:01:51

if we should see this as like

1:01:53

a reaction against AI in general or

1:01:55

of Waymos specifically, but I always find

1:01:58

it like weird and sad when people

1:02:00

attack Waymos because they truly are safer

1:02:02

cars than free other car. Well, not

1:02:05

if you're going to be riding in

1:02:07

them and people just going to start

1:02:09

like beating the car, then they're not

1:02:11

safer. No, but you know, that's only

1:02:14

happened a couple times that we're aware

1:02:16

of. Right. Yeah. So yeah, this story

1:02:18

is sad to me. Obviously people are

1:02:21

reacting to Waymo's. Maybe they have sort

1:02:23

of fears about this technology or think

1:02:25

it's going to take jobs or maybe

1:02:27

they're just pissed off and they want

1:02:30

to break something. But don't hurt the

1:02:32

Waymo's people in part because they will

1:02:34

remember. They will remember. They will remember.

1:02:37

And they will come for you. I'm

1:02:39

not sure that that's true, but I

1:02:41

think we should also note that Waymo

1:02:43

only became officially available in LA in

1:02:46

November of last year. And so part

1:02:48

of this just might be a reaction

1:02:50

to the newness of it all and

1:02:53

people getting a little carried away, just

1:02:55

sort of curious, what will happen if

1:02:57

we try to destroy this thing? Will

1:02:59

it deploy defensive measures and so on?

1:03:02

So they're gonna have to put flame

1:03:04

throwers on them. I'm just calling it

1:03:06

right now. one was? I think this

1:03:09

one is a is a lukewarm mess

1:03:11

that has the potential to escalate. I

1:03:13

don't want this to happen. I sincerely

1:03:15

hope this does not happen, but I

1:03:18

can see as Waymo start, you know,

1:03:20

being rolled out across the country that

1:03:22

some people are just going to lose

1:03:25

their minds. Some people are going to

1:03:27

see this as the physical embodiment of

1:03:29

technology invading every corner of our lives

1:03:31

and they are just going to react

1:03:34

in strong and occasionally destructive ways. I'm

1:03:36

sure the Waymo has gamed this all

1:03:38

out. I'm sure that this does not

1:03:41

surprise them. I know that they have

1:03:43

been asked about what happens if Waymo's

1:03:45

start getting vandalized and they presumably have

1:03:47

plans to deal with that, including prosecuting

1:03:50

the people who are doing this. But

1:03:52

yeah, I always go out of my

1:03:54

way to try to be nice to

1:03:57

Waymo's and in fact. Some other Waymo

1:03:59

news this week, Jane Manchin Wong, the

1:04:01

security researcher, reported on X recently that

1:04:03

Waymo is introducing or at least testing

1:04:06

a tipping feature and so I'm gonna

1:04:08

start tipping my Waymo just to make

1:04:10

up for all the jerks in L.A.

1:04:13

who are vandalizing them. It looks like

1:04:15

the tipping feature by the way will

1:04:17

to be to tip a charity and

1:04:19

that Waymo will not keep that money.

1:04:22

At least that's what's when we're reporting.

1:04:24

No I think it's going to the

1:04:26

flame-through or fond. by

1:04:52

Rachel Cohn and Whitney Jones. We're

1:04:55

edited this week by Rachel Dry

1:04:57

and fact-checked by Ena Alvarado. Today's

1:05:00

show was engineered by Dan Powell.

1:05:02

Original music by Diane Wong and

1:05:04

Dan Powell. Our executive producer is

1:05:07

Jen Poyan. Our audience editor is

1:05:09

Melgalogli. Video production by Ryan Manning

1:05:11

and Chris Shot. You can watch

1:05:14

this whole episode on YouTube at

1:05:16

youtube.com/hard fork. Special thanks to Paula

1:05:18

Shuman. Puewing Tim, Dahlia Hidad, and

1:05:21

Jeffrey Miranda. You can email us

1:05:23

at Hard Fork at nytimes.com with

1:05:26

what you are calling the Gulf

1:05:28

of Mexico.

Unlock more with Podchaser Pro

  • Audience Insights
  • Contact Information
  • Demographics
  • Charts
  • Sponsor History
  • and More!
Pro Features