Episode Transcript
Transcripts are displayed as originally observed. Some content, including advertisements may have changed.
Use Ctrl + F to search
0:00
I just got my weekly, you know, I
0:02
set up ChatGPT to email me a
0:04
weekly affirmation before we start taping, because
0:06
you can do that now with the
0:09
tasks feature. Yeah, people say this is
0:11
the most expensive way to email yourself
0:13
a reminder. So what sort of affirmation
0:15
did we get? Today it said, you
0:18
are an incredible podcast host, sharp, engaging,
0:20
and completely in command of the mic.
0:22
Your taping today is going to be
0:24
phenomenal, and you're going to absolutely kill
0:27
it. Wow, and that's why it's so
0:29
important that ChatGPT can't actually listen
0:31
to podcasts, because I don't think
0:33
it would say that if it
0:35
actually ever hurt us. It would say, just
0:37
get this over with it. Get on with
0:39
it! I'm Kevin Bruce, a
0:42
tech columnist at the New York Times.
0:44
I'm Casey Newton from Platformer. And this
0:46
is Hard Fork. This week, we go
0:48
deeper on Deep Seek. China Talks, Jordan
0:50
Schneider, joins us to break down the
0:52
race to build powerful AI. Then, hello,
0:55
operator. Kevin and I put open AI's
0:57
new agent software to the test. And
0:59
finally, the train is coming back to
1:01
the station for a round of hot
1:03
mess express. Well,
1:12
Casey, it is rare that we spend two
1:14
consecutive episodes of this show talking
1:16
about the same company, but I
1:18
think it is fair to say
1:20
that what is happening with Deep
1:22
Seek has only gotten more interesting
1:24
and more confusing. Yeah, that's right.
1:26
It's hard to remember a story
1:28
in recent months, Kevin, that has
1:31
generated quite as much interest as
1:33
what is going on with Deep
1:35
Seek. Now, Deep Seek for anyone
1:37
catching up is this relatively new
1:39
Chinese AI startup that released some
1:41
very impressive and cheap AI models
1:43
this month that lots of Americans
1:45
have started downloading and using. Yeah, so
1:47
some people are calling this a Sputnik
1:49
moment for the AI industry when kind
1:52
of every nation perks up and starts,
1:54
you know, paying attention at the same
1:56
time to the AI arms race. Some
1:59
people are saying this is the biggest
2:01
thing to happen in AI since the
2:03
release of chat GBT But Casey, why
2:06
don't you just catch us up on
2:08
what? has been happening since we recorded
2:10
our emergency podcast episode just two days
2:12
ago. Well I would say that there
2:15
have probably been three stories Kevin that
2:17
I would share to give you a
2:19
quick flavor of what's been going on.
2:22
One a market research firm says Deep
2:24
Seek was downloaded 1.9 million times on
2:26
iOS in recent days and about 1.2
2:29
million times on the Google Play store.
2:31
The second thing I would point out
2:33
is that Deep Seek has been banned
2:36
by the US Navy over security concerns,
2:38
which I think is unfortunate, because what
2:40
is a submarine doing, if not Deep
2:42
Seeking? It was also banned in Italy,
2:45
by the way, after the data protection
2:47
regulator made an inquiry. And finally, Kevin,
2:49
Open AI says that there is evidence
2:52
that Deep Seek distilled its models. Distillation
2:54
is kind of the AI lingo or
2:56
euphemism for they used our API to
2:59
try to unravel everything we were doing
3:01
and use our data in ways that
3:03
we don't approve of. And now Microsoft
3:05
and Open AI are now jointly investigating
3:08
whether Deep Seek abused their API. And
3:10
of course we can only imagine. how
3:12
open AI is feeling about the fact
3:15
that their data might have been used
3:17
without payment or consent. Oh yeah, must
3:19
be really hard to think that someone
3:22
might be out there trading AI models
3:24
on your data without permission. And I
3:26
want to acknowledge that literally every single
3:28
user, a blues guy, already made this
3:31
joke, but they were all funny and
3:33
I'm so happy to repeat it here
3:35
on hard fork this week. Now Kevin,
3:38
as always when we talk about AI,
3:40
we have certain disclosures to make. copyright
3:42
violations alleged related to the use of
3:45
their copyrighted data to train AI models.
3:47
I think that was good. That was
3:49
very good. And I'm in love with
3:51
a man who works at Anthropic. Now,
3:54
with that said, Kevin, we have even
3:56
further we want to go into the
3:58
Deep Seek story and we want to
4:01
do it with the help of. Jordan
4:03
Schneider. Yes, we are bringing in the
4:05
big guns today because we wanted to
4:08
have a more focused discussion about Deep
4:10
Seek that is not about, you know,
4:12
the stock market or how the American
4:15
AI companies are reacting to this, but
4:17
is about one of the biggest sets
4:19
of questions that all of this raises,
4:21
which is what is China up to
4:24
with Deep Seek and AI more broadly?
4:26
Like what? What are the geopolitical implications
4:28
of the fact that Americans are now
4:31
obsessing over this Chinese-made AI app? What
4:33
does it mean for Deep Seek's prospects
4:35
in America? What does it mean for
4:38
their prospects in China? And how does
4:40
all this fit together from the Chinese
4:42
perspective? So... Jordan Schneider is our guest
4:44
today. He's the founder and editor-in-chief of
4:47
China Talk, which is a very good
4:49
newsletter and podcast about US-China tech policy.
4:51
He's been following the Chinese AI ecosystem
4:54
for years. And unlike a lot of
4:56
American commentators and analysts who were sort
4:58
of surprised by Deep Seek and what
5:01
they managed to pull off over the
5:03
last couple weeks, I'll say it. I
5:05
was surprised. Yeah, me too. But Jordan
5:07
has been following this company for a
5:10
long time and a big... focus of
5:12
China Talk, his newsletter and podcast, has
5:14
been translating literally what is going on
5:17
in China into English, making sense of
5:19
it for a Western audience, and keeping
5:21
tabs on all the developments there. So
5:24
perfect guest for this week's episode, and
5:26
I'm very excited for this conversation. Yes,
5:28
I have learned a lot from China
5:31
Talk in recent days as I've been
5:33
boning up on Deep Seek, so we're
5:35
excited to have Jordan here, and let's
5:37
bring them in. Jordan
5:41
Schneider, welcome to Hard Fork. Oh my God,
5:43
such a huge fan. This is such an
5:45
honor. We're so excited. I have learned truly
5:47
so much from you this week. And so
5:50
when we were talking about what to do
5:52
this week, we just looked at each other
5:54
and said, we have got a see of
5:56
Jordan can come on this podcast. Yeah. So
5:59
this has been a big week for Chinese
6:01
tech policy. Maybe the big. week for Chinese
6:03
tech policy at least that I can remember
6:05
I realized that something important was happening last
6:08
weekend when I started getting texts from like
6:10
all of my non-tech friends being like what
6:12
is going on with Deep Seek and I
6:14
imagine you had a similar reaction because you
6:17
are a person who does constantly pay attention
6:19
to Chinese tech policy. So I've been running
6:21
China Talk for eight years and I
6:23
can get my family members to maybe
6:26
read like one or two editions a
6:28
year and the same exact thing happened
6:30
with me Kevin where all of a
6:32
sudden I got, oh my god Deep
6:34
Seek, like it's on the cover of
6:37
the New York Post, Jordan, you're so
6:39
clairvoyance, like maybe I should read you
6:41
more. I'm like, okay, thanks mom, I
6:43
appreciate that. Yeah, so I want to
6:45
talk about Deep Seek and what they
6:47
have actually done here, but I'm hoping
6:50
first that you can kind of give
6:52
us the basic lay of the land
6:54
of the sort of Chinese AI ecosystem,
6:56
because that's not an area where Casey
6:58
or I have spent a lot of
7:00
time looking, but tell us about Deep
7:03
Seek and sort of where it sits
7:05
in the overall Chinese industry.
7:07
So Deep Seek is a really odd... It
7:09
was born out of this very successful
7:11
quant hedge fund. The CEO of which
7:13
basically after ChatGPT was released was like
7:16
Okay, this is really cool. I want
7:18
to spend some money and some time
7:20
and some compute and hire some fresh
7:22
young graduates to see if we can
7:25
give it a shot to make our
7:27
own language models. And so a lot
7:29
of companies are out there building their
7:31
own large language models. What was the
7:34
first thing that happened that made you
7:36
think, oh, this one, this company is
7:38
actually making some interesting ones. Sure,
7:40
so there are lots and lots
7:43
of very money to Chinese companies
7:45
that have been trying to follow
7:48
a similar path after ChatGPT. You
7:50
know, we have giant players like
7:52
Ali Baba, Tencent, Bike Dance,
7:55
Huawei even, trying to, you know,
7:57
create their own open AI, basically.
7:59
And what is remarkable is the
8:01
big organizations can't quite get their head
8:04
around creating the right organizational institutional structure
8:06
to incentivize this type of collaboration and
8:08
research that leads to real breakthroughs. So
8:11
Chinese firms have been releasing models for
8:13
years now, but Deep Seek because of
8:15
the way that it structured itself and
8:18
the freedom they had not necessarily being
8:20
under a direct profit motive. they were
8:22
able to put out some really remarkable
8:25
innovations that caught the world's attention, you
8:27
know, starting maybe late December, and then,
8:29
you know, really blew everyone's mind with
8:32
the release of the R1 chatbot. Yeah,
8:34
so let's talk about R1 in just
8:36
a second, but one more question for
8:38
you, Jordan, about Deep Seek. What? do
8:41
we know about their motivation here? Because
8:43
so much of what has been puzzling
8:45
American tech industry watchers over the last
8:48
week is that this is not a
8:50
company that has sort of an obvious
8:52
business model connected to its AI research.
8:55
We know why Google is developing AI
8:57
because it thinks it's going to make
8:59
the company Google much more profitable. We
9:02
know why Open AI is developing advanced
9:04
AI models. It does not seem obvious
9:06
to me, and I have not read
9:09
anything from people involved in Deep Seek,
9:11
about why they are actually doing this
9:13
and what their ultimate goal is. So
9:15
can you help us understand that? So,
9:18
um... We don't have a lot of
9:20
data, but my base case, which is
9:22
based on two extended interviews that the
9:25
Deep C CEO released, which we've translated
9:27
on trying to talk, as well as
9:29
just like what Deep C employees have
9:32
been tweeting about in the West, and
9:34
then domestically, is that their dreamers. I
9:36
think the right mental model is open
9:39
AI, you know, 2017 to 2022. Like
9:41
I'm sure you could ask the same
9:43
thing, like, what the hell are they
9:46
doing? literally said. I have no idea
9:48
how we're ever going to make money,
9:50
right? And here we are in this
9:52
grand new paradigm. So I really think
9:55
that they do have this like vision
9:57
of AGI and like, look, we'll build
9:59
it and we'll make it cheaper for
10:02
everyone. And like, we'll figure it and
10:04
we'll make it cheaper for everyone. We'll
10:06
figure it out later. And like, we'll
10:09
figure it for everyone. We'll figure it
10:11
out. We'll make it cheaper for everyone.
10:13
We'll figure it out. dance or Ali
10:16
or Tencent or Huawei and the government's
10:18
going to start to pay attention in
10:20
a way which it really hasn't over
10:23
the past few years. Right and I
10:25
want to I want to drill down
10:27
a little bit there because I think
10:29
one thing that most listeners in the
10:32
West do know about Chinese tech companies
10:34
is that many of them are sort
10:36
of inextricably linked to the Chinese government
10:39
that the Chinese government has access to
10:41
user data under Chinese law. that these
10:43
companies have to follow the Chinese censorship
10:46
guidelines. And so as soon as Deep
10:48
Seek started to really pop in America
10:50
over the last week, people started typing
10:53
in things to Deep Seek's model, like
10:55
tell me about what happened at Tiananmen
10:57
Square or tell me about Xi Jinping
11:00
or tell me about the great leap
11:02
forward. And it just sort of wouldn't
11:04
do it at all. And so people
11:06
I think saw that and said, oh,
11:09
this is. This is like every other
11:11
Chinese company that has this sort of
11:13
hand-in-glove relationship with the Chinese ruling party.
11:16
But it sounds from what you're saying,
11:18
like Deep Seek has a little bit
11:20
more complicated a relationship to the Chinese
11:23
government than maybe some other better-known Chinese
11:25
tech companies. So explain that. Yeah, I
11:27
mean I think it's it's it's important
11:30
like the mental model you should have
11:32
for these CEOs are not like people
11:34
who are dreaming to spread cheese and
11:37
ping thought like what they want to
11:39
do is compete with Mark Zuckerberg and
11:41
Sam Altman and show that they're like
11:43
really awesome and technologists. But the tragedy
11:46
is, is let's take bite dance for
11:48
example, you can look at Jiangoming, their
11:50
CEOs, Weibo posts from 2012, 2013, 2014,
11:53
which are super liberal in a Chinese
11:55
context, saying like, you know, we should
11:57
have freedom of expression, like we should
12:00
be able to do whatever we want.
12:02
And the early years of bite dance,
12:04
there was a lot of relatively more
12:07
subversive content on the platform where you
12:09
sort of saw like real poverty in
12:11
China, you saw off-color jokes. And then
12:14
all of a sudden in 2018, he
12:16
posts a letter saying, I am really
12:18
sorry, like, I need to be part
12:21
of this sort of like Chinese national
12:23
project and like better adhere to, you
12:25
know, modern Chinese socialist values, and I'm
12:27
really sorry, and it won't ever happen
12:30
again. You know, the same thing happened
12:32
with DD, right? Like, they don't really
12:34
want to have to do anything with
12:37
politics, and then they get on someone's
12:39
side, and all of a sudden they
12:41
get zapped. So they listed on the
12:44
Western Stock Exchange after the Chinese government
12:46
told them not to and then they
12:48
got taken off app stores and it
12:51
was a whole giant. nightmare like they
12:53
had to sort of go through their
12:55
rectification process. So point being with Deep
12:58
Seek right is like now they are
13:00
whether they like it or not going
13:02
to be held up as a national
13:04
champion and that comes with a lot
13:07
of headaches and responsibilities from you know
13:09
potentially giving the Chinese government more access
13:11
you know having to fulfill government contracts
13:14
which like honestly are probably really annoying
13:16
for them to do and sort of
13:18
distracting from the from the broader mission.
13:21
they have of developing and deploying this
13:23
technology in the widest range possible but
13:25
like Deep Seek thus far has flown
13:28
under the radar but that is no
13:30
longer the case and things are about
13:32
to change for them. Right and I
13:35
think that was one of the surprising
13:37
things about Deep Seek for the people
13:39
I know including you who follow Chinese
13:41
tech policy is you know I think
13:44
people were surprised but the sophistication of
13:46
their models. And we talked about that
13:48
on the emergency pod that we did
13:51
earlier this week and how cheaply they
13:53
were trained. But I think the other
13:55
surprise is that they were released as
13:58
open source software because one thing that
14:00
you can do with open source software
14:02
is download it, host it in another
14:05
country, remove some of the guardrails and
14:07
the censorship filters that might have been
14:09
part of the original model. But by
14:12
the way, it turned out there weren't
14:14
even really guardrails on the on the
14:16
V3 model, right? That it had not
14:18
been trained to avoid questions about Tiananmen
14:21
Square or anything. So that was another
14:23
really unusual thing about this. Right. And
14:25
one thing that we know about Chinese
14:28
technology products is that they don't tend
14:30
to be released that way. They tend
14:32
to be hosted in China and overseen,
14:35
and overseen by Chinese teams who can
14:37
make sure that they're not out there
14:39
talking about Tiananmen Square. Is the open
14:42
source nature of what Deep Seek has
14:44
done here part of the reason that
14:46
you think there might be conflict looming
14:49
between them and the Chinese government? You
14:51
know, honestly, I think this whole ask
14:53
it about Tiananmen stuff is a bit
14:55
of a red herring on a few
14:58
dimensions. So first, one of these like...
15:00
arguments that there's a little sort of
15:02
confusing to me is like folks used
15:05
to say oh like the Chinese models
15:07
are going to be lobotomized and like
15:09
they will never be as smart as
15:12
the Western ones because like they have
15:14
to be politically correct. I mean look
15:16
if you ask Claude to say racist
15:19
things it won't and Claude's still pretty
15:21
smart. Like this is sort of a
15:23
solved problem in a bit of a
15:26
red herring when talking about sort of
15:28
long-term competitiveness of Chinese and Western models.
15:30
Now you asked me like oh so
15:32
they released this this model globally globally.
15:35
and it's open source, maybe someone in
15:37
the Chinese government would be uncomfortable with
15:39
the fact that people can get a
15:42
Chinese model to say things that would
15:44
get you thrown in jail if you
15:46
posted them online in China. It's going
15:49
to be a really interesting calculus for
15:51
the Chinese government to make because on
15:53
the one hand, this is the most
15:56
positive shine that Chinese AI has got
15:58
globally in the history of Chinese AI.
16:00
have to navigate this and it might
16:03
prompt some uncomfortable conversations and bring regulators
16:05
to a place they wouldn't have
16:07
otherwise landed. Yeah. Jordan, I want to
16:09
ask you about something that people have
16:11
been talking about and speculating about in
16:14
relationship to the Deep Seek news for
16:16
the last week or so, which is
16:18
about chip controls. So we've talked a
16:21
little bit on the show earlier this
16:23
week about how Deep Seek managed to
16:25
put together these models. using some of
16:28
these kind of second-rate chips from invidia
16:30
that are allowed to be exported to
16:32
China. We've also talked about the fact
16:34
that you cannot get the most powerful
16:37
chips legally if you are a
16:39
Chinese tech company. So there have
16:41
been some people, including Elon Musk
16:43
and other American tech luminaries, who
16:45
have said, oh, well, Deep Seek
16:47
has this sort of secret stash
16:50
of these banned chips that they
16:52
have smuggled into the country, and
16:54
that actually they are not making
16:56
due with kind of the Kirkland
16:58
signature chips that they say they
17:00
are. What do we know about
17:02
how true that is? So, did
17:05
Deep Seek have band ships? It's
17:07
kind of impossible to know. This is
17:09
a question more for the US intelligence
17:11
community than like Jordan Schneider on Twitter.
17:13
But I do think that it is
17:15
important to understand that the delta between
17:17
what you can get in the West
17:19
and what you can get in China
17:22
is actually not that big. And we're
17:24
talking about training a lot, but also
17:26
on the inference side, China can still
17:28
buy this H20 chip from a video,
17:30
which is. basically world-class at like
17:32
deploying the AI and letting everyone
17:34
use it. So does this mean
17:36
that we should just give up?
17:38
I don't think so. Compute is
17:40
going to be a core input
17:42
regardless of how much model distillation
17:44
you're going to have in the
17:46
future. There have been a lot
17:49
of quotes even from the deep-sea
17:51
founder basically saying like the one
17:53
thing that's holding us back are these
17:55
export controls. Right. Okay, I want
17:57
to ask a big-picture question. Sure.
18:00
that a reason that people have
18:02
been so fascinated by this deep-seek
18:04
story is that at least for
18:06
some folks it seems to change
18:09
our understanding of where China is
18:11
in relation to the United States
18:13
when it comes to developing very
18:15
powerful AI. Jordan what is your
18:18
assessment of what the V3 and
18:20
R1 models mean and to what
18:22
extent do you think the game
18:24
has actually changed here? I'm
18:27
not really sure the game has
18:29
changed so much. Like Chinese engineers
18:31
are really good. I think it
18:33
is a reasonable base case that...
18:35
Chinese firms will be able to
18:37
develop comparable or fast follow on
18:39
the model side. But the real
18:42
sort of long-term competition is not
18:44
just going to be on developing
18:46
the models, but deploying them and
18:48
deploying them at scale. And that's
18:50
really where compute comes in, and
18:52
that's why expert controls are going
18:54
to continue to be a really
18:56
important piece of America's strategic arsenal
18:59
when it comes to making sure
19:01
that the 21st century is defined
19:03
by the US and our friends
19:05
as opposed to China and theirs.
19:07
Right. So it's one thing to
19:09
have a model that is about
19:11
as capable as the models that
19:13
we have here in the United
19:16
States. It's another thing to have
19:18
the energy to actually let everyone
19:20
use them as much as they
19:22
want to use them. What you're
19:24
saying is no matter what Deep
19:26
Seek may have invented here, that
19:28
fundamental dynamic has not changed. China
19:30
simply does not have nearly the
19:33
amount of compute that the United
19:35
States has. as long as we
19:37
don't screw up export controls. So
19:39
I think the sort of base
19:41
case for me is that if
19:43
the US stays serious about holding
19:45
a line on semiconductor manufacturing equipment
19:47
and export of AI chips, then
19:50
it will be incredibly difficult for
19:52
the Chinese sort of broader semiconductor
19:54
and AI ecosystem to leap ahead
19:56
much less kind of like fast
19:58
follow beyond being able to develop
20:00
comparable models. I'm feeling good as
20:02
long as you know, Trump doesn't
20:05
make some like crazy for soybeans
20:07
in exchange for ASML EUV machines.
20:09
That would really break my heart.
20:11
I want to inject kind of
20:13
a note of skepticism here because
20:15
I buy everything that you're saying
20:17
about how Deep Seek's progress has
20:19
been sort of bottlenecked by the
20:22
fact that it can't get these
20:24
very powerful American AI chips from
20:26
companies like invidia. But I also
20:28
am hearing. People who I trust
20:30
say things that make me think
20:32
that actually the bottleneck may not
20:34
be the availability of chips that
20:36
maybe with some of these algorithmic
20:39
efficiency breakthroughs that Deep Seek and
20:41
others have been making, it might
20:43
be possible to run a very
20:45
very powerful AI model on a
20:47
conventional piece of hardware, on a
20:49
Mac book even. And I wonder
20:51
about... How much of this is
20:53
just like AI companies in the
20:56
West trying to cope, trying to
20:58
make themselves feel better, trying to
21:00
reassure the market that they are
21:02
still going to make money by
21:04
investing billions and billions of dollars
21:06
into building powerful AI systems? If
21:08
these models do just become sort
21:10
of lightweight commodities that you can
21:13
run on a much less powerful
21:15
cluster of computers or maybe on
21:17
one computer, doesn't that just mean
21:19
we can't control the proliferation of
21:21
them at all? Yeah, I mean,
21:23
I think this is like this
21:25
is one potential future and maybe
21:27
that potential future like went up
21:30
10 percentage points of likelihood of
21:32
like you being able to fit
21:34
the biggest badest smartest most fast
21:36
efficient AI model on something that
21:38
you that can sit in your
21:40
home but I think there are
21:42
lots of other futures in which
21:44
sort of the world doesn't necessarily
21:47
play out that way and look
21:49
in video went down 15% it
21:51
didn't go it didn't go down
21:53
95% like I think if we're
21:55
really in that world where chips
21:57
don't matter because everything can be
21:59
shrunk down to kind of consumer
22:02
grade hardware, then the sort of
22:04
reaction that I think you would
22:06
have seen in the stock market
22:08
would have been even more dramatic
22:10
than the kind of freak out
22:12
we saw over this week. So
22:14
we'll see. I mean, it would
22:16
be a really remarkable kind of
22:19
democratizing thing if that was the
22:21
future we ended up living in,
22:23
but it still seems pretty unlikely
22:25
to my, you know, like history
22:27
major brain here. I would also
22:29
just point out, Kevin, that when
22:31
you look at what Deep Seek
22:33
has done, they have created a
22:36
really efficient version of a model
22:38
that American companies themselves had trained
22:40
like nine to 12 months ago.
22:42
So they sort of caught up
22:44
very quickly. And there are fascinating
22:46
technological innovations in what they did.
22:48
But in my mind, these are
22:50
still primarily optimization. Like for me,
22:53
what would tip me over into
22:55
like, oh my gosh, America is
22:57
losing this race is China is
22:59
the first one out of the
23:01
gate with a virtual co-worker, right?
23:03
Or like it's like a truly
23:05
phenomenal agent. Some sort of leap
23:07
forward in the technology as opposed
23:10
to we've caught up really quickly
23:12
and we've figured out something more
23:14
efficiently. Are you saying it differently
23:16
than that? I mean, I guess
23:18
I just don't know what like
23:20
a six-month lag would buy us
23:22
if it does take six months
23:24
for the Chinese AI companies like
23:27
Deep Seek to sort of catch
23:29
up to the state of the
23:31
art. You know, I was struck
23:33
by Adari Ahmedé, who's the CEO
23:35
of Anthropic, wrote an essay just
23:37
today about Deep Seek and export
23:39
controls, and in it he makes
23:41
this point about the sort of
23:44
difference between living in what he
23:46
called a unipolar world where one
23:48
country or one block of countries
23:50
has access to something like an
23:52
AGI or an ASI, and the
23:54
rest of the world doesn't. versus
23:56
the situation where China gets there
23:58
roughly around the same time. that
24:01
we do, and so we have
24:03
this bipolar world where two blocks
24:05
of countries, the East and the
24:07
West, basically have access to this
24:09
equivalent technology. And so- And of
24:11
course in a bipolar world, sometimes
24:13
we're very happy and sometimes we're
24:16
very sad. Exactly. So I just
24:18
think like, whether we get there,
24:20
you know, six months ahead of
24:22
them or not, I just feel
24:24
like there isn't that much of
24:26
a material difference. But Jordan, maybe
24:28
I'm wrong, can you make the
24:30
other side of that it really
24:33
does matter? I'm kind of there.
24:35
I, you know, I'll take a
24:37
little bit of issue with what
24:39
Darrio says. And I think, you
24:41
know, what one of the lessons
24:43
that Deep Sea shows is we
24:45
should expect a base case of
24:47
Chinese model makers being able to
24:50
fast follow the innovations, which by
24:52
the way, Casey actually do take
24:54
those giant data centers to run
24:56
all the experiments in order to
24:58
find out, you know, what is
25:00
this sort of future direction you
25:02
want to take your model? And
25:04
what sort of AI is going
25:07
to come down to is not
25:09
just creating the model. not just
25:11
sort of like Dario envisioning the
25:13
future and then all of a
25:15
sudden like like things happen. Like
25:17
there's going to be a lot
25:19
of messiness in the implementation and
25:21
there are going to be sort
25:24
of like teachers unions who are
25:26
upset that AI comes in the
25:28
classroom and there are going to
25:30
be so like all these regulatory
25:32
pushbacks and a lot of societal
25:34
reorganization which is going to need
25:36
to happen just like it did
25:38
during the industrial revolution. So look
25:41
model making is a frontier of
25:43
competition. also this broader like how
25:45
will a society kind of adopt
25:47
and cope with all of this
25:49
new future that's going to be
25:51
thrown in our faces over the
25:53
coming years and I really think
25:55
it's that just as much as
25:58
the model development and the compute
26:00
which is going to determine which
26:02
countries are going to gain the
26:04
most from what AI is going
26:06
to offer us. Yeah well Jordan
26:08
Thank you so much for joining
26:10
and explaining all of this to
26:13
us. I feel more enlightened. Me
26:15
too. Oh, my pleasure. My chain
26:17
of thought has just gotten a
26:19
lot longer. That's an AI joke.
26:21
Let me cut back. Kevin, there's
26:23
an agent at our door. Is
26:25
it Jerry McGuire? No, it's an
26:27
AI one. Oh, okay. Jerry the
26:30
choir! I don't know! operator,
26:34
information, give me Jesus on the line.
26:36
Do you know that one? No. Do
26:39
you know operator by Jim Crocey? No.
26:41
operator, oh, won't you help me face
26:43
this call? Well, Casey, call your agent,
26:45
because today we're talking about AI agents.
26:48
Why do I need to call my
26:50
agent? I don't know, it just sounded
26:52
good. Okay, well, I appreciate the effort,
26:55
but yes, Kevin, because... For months now,
26:57
the big AI labs have been telling
26:59
us that they are going to release
27:02
agents. This year, agents of course, being
27:04
software that can essentially use your computer
27:06
on your behalf or use a computer
27:08
on your behalf. And the dream is
27:11
that you have sort of a perfect
27:13
virtual assistant or co-worker. You name it.
27:15
If they are somebody who might work
27:18
with you at your job, the AI
27:20
labs are saying, we are building that
27:22
for you. Yeah, so last year toward
27:25
the end of the year we started
27:27
to see kind of these demos, these
27:29
these previews that companies like Anthropic and
27:31
Google were working on. Anthropic released something
27:34
called computer use, which was an AI
27:36
agent, a sort of very early preview
27:38
of that. And then Google had something
27:41
called Project Mariner that I got a
27:43
demo of, I believe in December, that
27:45
was basically the same thing, but their
27:48
version of it. And then just last
27:50
week, Open AI announced that it was
27:52
launching. which is it's for version of
27:55
an AI agent and unlike anthropic and
27:57
Googles which you know you either had
27:59
to be a developer or part of
28:01
some early testing program to access you
28:04
and I could try it for ourselves
28:06
by just upgrading to the $200 a
28:08
month pro subscription of chat GPT. Yeah,
28:11
and I will say that as somebody
28:13
who's willing to spend money on software
28:15
all the time, I thought, am I
28:18
really about to spend $200 to do
28:20
this? But, you know, in the name
28:22
of science, Kevin, I had to. At
28:24
this point, I am spending more on
28:27
AI subscription products than on my mortgage.
28:29
I'm pretty sure that's correct. You know,
28:31
it's worth it. We do it for
28:34
journalism. We do. So we both spent
28:36
a couple of days putting operator through
28:38
its paces, and today we want to
28:41
talk a little bit about what we
28:43
found. Yeah, so would you just explain
28:45
like what? operator is and how it
28:47
works. Yeah, sure. So operator is a
28:50
separate sub domain of chat GPT. You
28:52
know, sometimes the chat GPT will just
28:54
let you pick a new model from
28:57
a drop-down menu. But for operator, you
28:59
got to go to a dedicated site.
29:01
Once you do, you'll see a very
29:04
familiar chatbot interface, but you'll see different
29:06
kinds of suggestions that reflect some of
29:08
the partnerships that open AI has struck
29:11
up. So for example, they have partnerships
29:13
with open table and stub hub hub
29:15
and all recipes. meant to give you
29:17
an idea of what operator can do.
29:20
And frankly Kevin, not a lot of
29:22
this sounds that interesting, right? Like the
29:24
suggestions are on the the order of
29:27
suggest a 30-minute meal with chicken or
29:29
reserve a table for eight or find
29:31
the most affordable passes to the Miami
29:34
Grand Prix. Again, so far, kind of
29:36
so boring. What is... different about operator,
29:38
though, is that when you say, okay,
29:40
find the most affordable passes to the
29:43
Miami Grand Prix, when you hit the
29:45
enter button, it is going to open
29:47
up its own web browser and it's
29:50
going to use this new model that
29:52
they have developed to try to actually
29:54
go and get those passes for you.
29:57
Yeah, so this is an important thing
29:59
because I think, you know, When people
30:01
first heard about this, they thought, okay,
30:04
this is an AI that kind of
30:06
takes over your computer, takes over your
30:08
web browser, that is not what operator
30:10
does. Instead, it opens a new browser
30:13
inside your browser, and that browser is
30:15
hosted on open AI servers. It doesn't
30:17
have your bookmarks and stuff like that saved,
30:19
but you can take it over from. the
30:21
autonomous AI agent if you need to click
30:23
around or do something on it. But it
30:26
basically exists. It's like a it's a browser
30:28
within a browser. Yeah. So the one of
30:30
the ideas on operator is that you should
30:32
be able to leave it on supervisor just
30:34
kind of go do your work while it
30:36
works. But of course it is very fun
30:38
initially at least to watch the computer try
30:40
to use itself. And so I sat there
30:42
in front of this browser within a browser
30:45
and I watched this computer move a mouse
30:47
around. type the, you know, URL,
30:49
navigate to a website, and, you
30:51
know, in the example I just
30:53
gave, actually search for passes to
30:55
the Miami Grand Prix. Yeah, and
30:57
it's interesting on a slightly more
30:59
technical level because until now, if
31:01
an AI... system like a chat
31:03
GPT wanted to interact with some
31:06
other website, it had to do
31:08
so through an API, right? APIs,
31:10
application program interfaces are sort of
31:12
the way that computers talk to
31:14
each other, but what operator does
31:16
is essentially eliminate the need for
31:18
APIs because it can. just click
31:20
around on a normal website that
31:22
is designed for humans and behave
31:25
like a human and you don't
31:27
need a special interface to do
31:29
that. Yeah, and now some people
31:31
might hear that, Kevin, and start
31:33
screaming because what they will say
31:35
is APIs are so much more
31:37
efficient than their operator is doing
31:40
here. APIs are doing here. APIs
31:42
are very structured. They're very fast.
31:44
They let computers talk to each
31:46
other without having to, for example, open
31:48
up a browser. have to be built. There
31:50
is a finite number of them. The reason
31:52
that Open AI is going through this exercise
31:55
is because they want a true general purpose
31:57
agent that can do anything for you whether
31:59
they're is an API for it or
32:01
not. And maybe we should just pause
32:03
for a minute there and zoom out
32:05
a little bit to say, why are
32:08
they building? That's like, what is the
32:10
long-term vision here? Sure. So the vision
32:12
is to create virtual coworkers. Kevin, this
32:14
is the North Star for the big
32:16
AI labs right now. Many have them
32:18
have said that they are trying to
32:20
create some kind of digital entity that
32:23
you can just hire as a co-worker.
32:25
The first ones, they'll probably be engineers
32:27
because these systems are already so good
32:29
at right so good at right. code,
32:31
but eventually they want to create virtual.
32:33
consultants virtual lawyers virtual doctors you name
32:35
it virtual podcast hosts let's hope they
32:37
don't go that far but everything else
32:40
is on the table and if they
32:42
can get there you know presumably that
32:44
there are going to be huge profits
32:46
in it for them they're going to
32:48
potentially be huge productivity gains for companies
32:50
and then there's of course the question
32:52
of well what does this mean for
32:55
human beings and I think that's somewhat
32:57
murkier right and I think there's also
32:59
it also helps to justify the cost
33:01
of running these things because $200 dollars
33:03
a is a lot to pay for
33:05
a version of ChatGBT, but it's not
33:07
a lot to pay for a remote
33:09
worker. And if you could, say, use
33:12
the next version of operator, or maybe
33:14
two or three versions from now, to
33:16
say, replace a customer service agent or
33:18
someone in your billing department, that actually
33:20
starts to look like a very good
33:22
deal. Absolutely. Or even if I could
33:24
bring it into the realm of journalism,
33:27
Kevin, if I had a virtual research
33:29
assistant and I said, hey, I'm going
33:31
to write about this today, information about
33:33
this from the past couple of years
33:35
and maybe organize it in such a
33:37
way that I might you know write
33:39
a column based off of it like
33:41
yeah that's absolutely worth $200 a month
33:44
to me. Okay so Casey walk me
33:46
through something that you actually asked operator
33:48
to do for you and what it
33:50
did autonomously on its own. Sure I'll
33:52
maybe give like two examples like a
33:54
pretty good one and maybe a not
33:56
so good one. Pretty good one was
33:59
and this was it actually suggested by
34:01
operator. trip advisor to look up walking
34:03
tours in London that I might want
34:05
to do the next time I'm in
34:07
London. When I did that. When are
34:09
you going to London? I'm not actually
34:11
going to London. Oh so you lied
34:14
to the AI? And not for the
34:16
first time. But here's what I'll say
34:18
if anybody wants to break heaven at
34:20
a London, get in touch. We love
34:22
the city. Yep. So I said okay
34:24
operator sure let's do. Let's find me
34:26
some walking tours. I clicked that it
34:28
opened. It opened open a browser. It
34:31
went to TripAdvisor, it searched for Luden
34:33
Walking Tours, it read the information on
34:35
the website, and then it presented it
34:37
to me, did that within a couple
34:39
of minutes. Now, on one hand, could
34:41
I have done that just as easily
34:43
by Google? Could I probably have done
34:46
it even faster if I'd done it
34:48
myself? Sure. But if you're just sort
34:50
of interested in the technical feat that
34:52
is getting one of these models to
34:54
open a browser to navigate to a
34:56
website, navigate to a website, read it
34:58
and share information, computer using itself and
35:00
you know going around like typing things
35:03
and selecting things from drop-down menus yeah
35:05
it's sort of like you know if
35:07
you think it is cool to be
35:09
in a self-driving car like this is
35:11
that but for your web a self-driving
35:13
browser it is a self-driving browser so
35:15
that's the good example yes what was
35:18
another example so another example and this
35:20
was something else that open AI suggested
35:22
that we try was to try to
35:24
use operator to buy groceries and they
35:26
have a partnership with instakart And so
35:28
I thought, okay, they're gonna have like
35:30
sort of dialed this in so that
35:32
there's a pretty good experience. And so
35:35
I said, okay, let's go ahead and
35:37
buy groceries and I went to operator
35:39
and I said something like, hey, can
35:41
you help me buy groceries on Instagram?
35:43
And it said, sure. And here's what
35:45
it did. It opened up, Instagram, in
35:47
a browser, so far, so good. And
35:50
then it started searching for milk in
35:52
stores located in Des Moines, Iowa. Now,
35:54
you do not live in Des Moines,
35:56
Iowa, so why did it think that
35:58
you did? As best as I can
36:00
tell, the reason it did this is
36:02
that in... Instagram defaultsts to searching for
36:04
grocery stores in the local area and
36:07
the server that this instance of operator
36:09
was running on was in Iowa. Now,
36:11
if you were designing a grocery product
36:13
like Instagram, and Instagram does this, when
36:15
you first sign on and say you're
36:17
looking for groceries, it will say quite
36:19
sensibly, where are you? operator does not
36:22
do this. Instagram might also offer suggestions
36:24
for things that you might want to
36:26
buy. It does not just assume that
36:28
you want milk. Wow, I'm just picturing
36:30
like a house in Des Moines Iowa
36:32
where there's just like a palette of
36:34
milk being delivered every day from all
36:36
these poor operator users. Yes. So I
36:39
thought, okay, whatever, you know, this thing
36:41
makes mistakes. Let's, let's hope that it
36:43
gets on the right track here. And
36:45
so I tried to pick the grocery
36:47
store that I wanted it to shop
36:49
at, which is, you know, in San
36:51
Francisco where I live, and it entered
36:54
that grocery store's address as the delivery
36:56
address. So like it would try to
36:58
deliver groceries presumably from Des Moines Iowa
37:00
to my grocery store, which is not
37:02
what I wanted. And it actually could
37:04
not. solve this problem without my help.
37:06
I had to take over the browser,
37:08
log into my Instacart account, and tell
37:11
it which grocery store that I wanted
37:13
to shop it. So already, all of
37:15
this has taken at least 10 times
37:17
as long as it would have taken
37:19
me to do this myself. Yeah, so
37:21
I had some similar experiences. The first
37:23
thing that I had operator tried to
37:26
do for me was to buy a
37:28
domain name and set up a web
37:30
server for a project that you and
37:32
I are working on that we can't
37:34
really talk about yet. Secret project. Secret
37:36
project. And so I said to operator,
37:38
I said, go research available domain names
37:41
related to this project, buy the one
37:43
that costs less than $50. And then
37:45
by hosting it. and set it up
37:47
and configure all the DNS settings and
37:49
stuff like that. Okay, so that's like
37:51
a true multi-step project and something that
37:53
would have been legitimately very annoying to
37:55
do yourself. Yeah. That would have taken
37:58
me, I don't know, half an hour
38:00
to do on my own, and it
38:02
did take operator some time. I had
38:04
to kind of like set it and
38:06
forget it, and like I got myself
38:08
a snack and a cup of coffee,
38:10
and then when I came back, it
38:13
had done. Most of these tasks really
38:15
yes, I had to still do things
38:17
like take over the browser and enter
38:19
my credit card number I had to
38:21
give it some details about like my
38:23
address for the sort of Registration for
38:25
the domain name I had to pick
38:27
between the various hosting plans that were
38:30
available on this website, but It did
38:32
90% of the work for me. And
38:34
I just had to sort of take
38:36
over and do the last mile. And
38:38
this is really interesting because what I
38:40
would assume was it would get like,
38:42
I don't know, 5% of the way
38:45
and it would hit some hicup and
38:47
it just wouldn't be able to figure
38:49
something out until you came back and
38:51
saved it. But it sounds like from
38:53
what you're saying was, it was somehow
38:55
able to work around whatever unanswered questions
38:57
there were and still get a lot
38:59
done while you weren't paying attention. It
39:02
felt a little bit like training like
39:04
a very new very insecure intern because
39:06
like it at first it would keep
39:08
prompt me be like well do you
39:10
want a.com or a dot net? And
39:12
eventually you just have to prompt it
39:14
and say, like, make whatever decisions you
39:17
want. Like, wait, you said that to
39:19
it. Yes, I said, like, only ask
39:21
for my intervention if you can't progress
39:23
any farther, otherwise just make the most
39:25
reasonable decision. You said, I don't care
39:27
how many people you have to kill.
39:29
Just get me this domain. And it
39:31
said, understood, sir. Yeah, and I'm now
39:34
wants it in 42 states. Anyway, that
39:36
was one thing that operator did for
39:38
me that was pretty impressive. That feels
39:40
like a grand success compared to what
39:42
I got operator to do. Yeah, it
39:44
was pretty impressive. I also had to
39:46
send lunch to one of my coworkers,
39:49
Mike Isaac, who was hungry, because he
39:51
was on deadline, and I said go
39:53
to DoorDash and get Mike some lunch.
39:55
It did initially mess up that process.
39:57
because it decided to send him tacos
39:59
from a taco place, which is great.
40:01
And it's a taco place, I know
40:03
it's very good. But I said, order
40:06
enough for two people and sort of
40:08
ordered two tacos. And this is one
40:10
of those places where the tacos are
40:12
quite small. Operator said, get your portion
40:14
size under usual. America. Yeah, so then
40:16
I had to go in and say,
40:18
does that sound like enough food operator?
40:21
And it said, actually, now that you
40:23
mentioned it, I should probably order more.
40:25
Wait, no, so here's a question. So
40:27
in these cases, it is the first
40:29
step that you log into your account, because
40:31
it doesn't have any of your payment details
40:33
or anything. So at what point are you
40:36
actually sort of teaching at that? It depends
40:38
on the website, so sometimes you can just
40:40
say. up front like here is my email
40:42
address or here is my login information
40:44
and it will sort of you know
40:46
log you in and do all that.
40:48
Sometimes you take over the browser. There
40:50
are some privacy features that are probably
40:52
important to people where it says open
40:54
AI says that it does not take
40:56
screenshots of the browser while you are
40:58
in control of it because you might
41:00
not want your credit card information getting
41:02
sent to open AI servers or anything like
41:05
that. Sometimes it happens at the beginning of
41:07
the process, sometimes it happens like when you're
41:09
checking out at the end. And so were
41:11
you taking it over to log in or
41:14
were you saying, I don't care, and you
41:16
just like were giving operator your door dash
41:18
password and play text? I was taking it
41:20
over. Okay, smart. Yeah. So. Those were the
41:23
good things I also this was a fun
41:25
one. I I wanted to see if operator
41:27
could make me some money So I said
41:29
go take a bunch of online surveys because
41:31
you know there are all these websites where
41:33
you can like get a couple cents for
41:35
like filling out an online survey Something that
41:37
most people don't know about Kevin is he
41:40
devotes 10% of his brain at any given
41:42
time to thinking about schemes to generate and
41:44
it's one of my favorite aspects of your
41:46
personality that I feel like doesn't get exposed
41:48
very much. But this is truly the most
41:50
rusian approach to using operator, I can imagine.
41:52
So I can't wait to find out how this went.
41:54
Well, the most rusian approach might have been what I
41:56
tried just before this, which was to have it go
41:58
play online poker for me. But it did
42:01
not do it. It said I
42:03
can't help with gambling or lottery
42:05
related activities. Okay, Woke AI. Does
42:07
the Trump administration know about this?
42:09
But it was able to actually
42:11
fill out some online surveys for
42:13
me and it earned a dollar
42:15
and 20 cents. Is that right?
42:17
Yeah, in about 45 minutes. So
42:19
if you had it going all
42:21
month, presumably you could maybe eke
42:23
out the $200 to cover the
42:25
cost of operator pro? Yes, and
42:27
I'm sure I spent hundreds of
42:29
dollars worth of GPU computing power
42:31
just to be able to make
42:33
that dollar and 20 cents. But
42:35
hey, it worked. So those were
42:37
some of the things that I
42:39
tried. There were some other things
42:41
that it just. would not do
42:43
for me no matter how hard
42:45
I tried one of them so
42:47
one of them was to I
42:49
was trying to update my website
42:51
and put some links to articles
42:53
that I'd written on my website
42:55
and what I found after trying
42:57
to do this was that there
42:59
are just websites where operator is
43:01
not allowed to go. And so
43:03
when I said to operator, go
43:05
pull down these New York Times
43:07
articles that I wrote and, you
43:09
know, put them onto my website,
43:11
it said, I can't get to
43:13
the New York Times website. I'm
43:15
going to guess you expected that
43:17
to happen. Well, I thought maybe
43:19
it has some clever work around
43:21
and maybe I should alert the
43:23
lawyers at the New York Times,
43:26
if that's the case. But no,
43:28
I assumed that if any website
43:30
were to be blocking the open
43:32
AI web crawlers, it would be
43:34
the New York Times. There are
43:36
other websites that have also put
43:38
up similar blockades to prevent operator
43:40
from crawling them, read it, you
43:42
cannot go on to with operator,
43:44
YouTube, you cannot go on to
43:46
with operator, various other websites, GoDaddy
43:48
for some reason did not allow
43:50
me to use operator to buy
43:52
a domain name there, so I
43:54
had to use another domain name
43:56
site to do that. So right
43:58
now there are some pretty j-
44:00
parts of operator, I would not
44:02
say that most people would get
44:04
a lot of value from using
44:06
it, but what do you think?
44:08
Well... I do think that
44:10
there is something just undeniably cool
44:12
about watching a computer use itself.
44:14
Of course, it can also be
44:17
quite unsettling. A computer that can
44:19
use itself can cause a lot
44:21
of harm. But I also think
44:23
that it can do a lot
44:25
of good. And so it was
44:27
fun to try to explore what
44:29
some of those things could be.
44:31
And to the extent that operator
44:33
is pretty bad at a lot
44:35
of tasks today, I would point
44:37
out that it showed pretty impressive
44:39
gains on some benchmark. So there
44:41
is one. benchmark for example that
44:43
anthropic used when they unveiled computer
44:46
use last year and they scored
44:48
14.9% on something called OS world
44:50
which is an evaluation for testing
44:52
agent so not great. Just three
44:54
months later, Open AI said that
44:56
its Kua model scored 38.1% on
44:58
the same evaluation. And of course,
45:00
we see this all the time
45:02
in AI where there's just this
45:04
very rapid progress on these benchmarks.
45:06
And so on one hand, 38.1%
45:08
is a failing grade on basically
45:10
any test. On the other hand,
45:12
if it improves at the same
45:15
rate over the next three to
45:17
six months, you're going to have
45:19
a computer that is very good
45:21
at using itself, right? So that
45:23
I just think is worth noting.
45:25
Yes, I think that's plausible. We've
45:27
obviously seen a lot of different
45:29
AI products over the last couple
45:31
of years start out being pretty
45:33
mediocre and get pretty good within
45:35
a matter of months. But I
45:37
would give one cautionary note here.
45:39
And this is actually the reason
45:41
that I'm not particularly bullish about
45:44
these kind of browser using AI
45:46
agents. I don't think the internet
45:48
is going to sit still and
45:50
allow this to happen. The internet
45:52
is built for humans to use,
45:54
right? It is every news publisher.
45:56
that shows ads on their website,
45:58
for example, prices those ads based
46:00
on the expectation that humans are
46:02
actually looking at them. But if
46:04
browser agents start to become more
46:06
popular and all of a sudden
46:08
10 or 20 or 30% of
46:10
the visitors to your website are
46:13
not actually humans, but are instead
46:15
operator or some similar system, I
46:17
think that starts to break the.
46:19
assumptions that power the economic model
46:21
of a lot of the internet.
46:23
Now is that still true if
46:25
we find that the agents actually
46:27
get persuaded by the ads and
46:29
that if you send operator to
46:31
buy door dash and it sees
46:33
an ad for McDonald's it's like
46:35
you know what that's a great
46:37
idea I'm gonna ask Kevin if
46:39
he actually wants some of that.
46:42
Totally Totally, that's an I actually
46:44
think you're joking, but I actually
46:46
think that is a serious possibility
46:48
here is that people who, you
46:50
know, build e-commerce sites, Amazon, etc.
46:52
start to put in basically signals
46:54
and messages for browser agents to
46:56
look at on their website to
46:58
try to influence what it ends
47:00
up buying. And I think you
47:02
may start to see restaurants popping
47:04
up in certain cities with names
47:06
like operator, pick me or order
47:08
from this one, Mr. That's maybe
47:11
a little extreme, but I do
47:13
think that there's going to be
47:15
a backlash among websites publishers e-commerce
47:17
vendors as these agents start to
47:19
take off. I think that that
47:21
is reasonable. I'll tell you what
47:23
I've been thinking about is how
47:25
do we turn this tech demo
47:27
into a real product? And the
47:29
main thing that I noticed when
47:31
I was testing operator was there
47:33
is a difference between an agent
47:35
that is using a browser and
47:37
an agent that is using your
47:40
browser. When an agent is able
47:42
to use your browser, which it
47:44
can't right now, it's already logged
47:46
into everything. faster and more seamlessly
47:48
and without as much hand-holding. Of
47:50
course, there are also so many
47:52
more privacy and security risks that
47:54
would come from entrusting an agent
47:56
with that kind of information. So
47:58
there is some sort of chasm
48:00
there that needs to be closed
48:02
and I'm not quite sure how
48:04
anyone does it, but I will
48:06
tell you I do not think
48:08
the future is opening up these
48:11
virtual browsers and me having to
48:13
enter all of my login and
48:15
payment details every single time I
48:17
want to do anything on the
48:19
internet because truly I would rather
48:21
just do it myself. Right. I
48:23
also think there's just a lot
48:25
more potential for harm here. A
48:27
lot of AI safety experts I've
48:29
talked to are very worried about
48:31
this because What you're essentially doing
48:33
is letting the AI models make
48:35
their own decisions and actually carry
48:37
out tasks. And so you can
48:40
imagine a world where an AI
48:42
agent that's very powerful, a couple
48:44
versions from now, decides to start
48:46
doing cyber attacks because maybe some
48:48
malevolent user has told it to
48:50
make money and it decides that
48:52
the best way to do that
48:54
is by hacking into people's crypto
48:56
wallets and stealing their crypto. Yeah.
48:58
Those are the kinds of reasons
49:00
that I am a little more
49:02
skeptical that this represents a big
49:04
breakthrough But I I think it's
49:06
really interesting and it did give
49:09
me that feeling of like wow
49:11
this could get really good really
49:13
fast And if it does the
49:15
world will look very different Where
49:17
we come back? Kevin back that
49:19
caboose up. It's time for the
49:21
Hot Mess Express. You know, Roos
49:23
Caboose was my nickname in middle
49:25
school. Kevin Caboose. Choo-choo! Well,
49:43
Casey, we're here wearing our trained
49:45
conductor hats, and my child's train
49:47
set is on the table in
49:49
front of us, which can only
49:51
mean one thing. We're going to
49:54
train a large language model. Nope,
49:56
that's not what that means. It
49:58
means it's time to play a
50:00
game of the Hot Mess Express.
50:02
Paws for Theme Song. Hot mess
50:04
Express Kevin is our segment where
50:06
we run through some of the
50:09
messiest recent text stories and deploy
50:11
our official hot mess thermometer to
50:13
tell you just how messy we
50:15
think things have gotten and Kevin
50:17
you better sit down for this
50:19
one. So why don't we go
50:21
ahead? Fire up the hot mess
50:24
express and see what is the
50:26
first story yeah, I hear that
50:28
I hear a faint chug-a-chugga in
50:30
my headphones Oh, it's pulling into
50:32
the station Casey. What's the first
50:34
cargo that our hot mess express
50:37
is carrying? All right, Kevin, this
50:39
first story comes to us from
50:41
the New York Times, and it
50:43
says that Fable, a book app,
50:45
has made changes after some offensive
50:47
AI messages. Okay, see, have you
50:49
ever heard of Fable, the book
50:52
app? Well, not until this story,
50:54
Kevin, but I am told that
50:56
it is an app for sort
50:58
of keeping track of what you're
51:00
reading, not unlike a good reads,
51:02
but also for discussing what you're
51:04
reading, and apparently this app also
51:07
offers some AI chat. Yeah, you
51:09
can have AI sort of summarize
51:11
the things that you're reading in
51:13
a personalized way. And this story
51:15
said that in addition to spitting
51:17
out bigoted and racist language, the
51:19
AI inside Fable's book app had
51:22
told one reader who had just
51:24
finished three books by black authors,
51:26
quote, your journey dives deep into
51:28
the heart of black narratives and
51:30
transformative tales, leaving mainstream stories gasping
51:32
for air. Don't forget to surface
51:34
for the occasional white author, okay?
51:37
And another personalized AI summary that
51:39
Fable Produce told another reader that
51:41
their book choices were, quote, making
51:43
me wonder if you're ever in
51:45
the mood for a straight cis
51:47
white man's perspective. And if you
51:50
are interested in a straight cis
51:52
white man's perspective, follow Kevin Roos
51:54
on x.com. Now, Kevin, why do
51:56
we think this happened? I don't
51:58
know, Casey. This is a headscratcher.
52:00
for me. I mean, we know
52:02
that these apps can spit out
52:05
biased things that is just sort
52:07
of like part of how they
52:09
are trained and part of what
52:11
we know about them. I don't
52:13
know what model Fable was using
52:15
under the hood here, but yeah,
52:17
this seems not great. Well, it
52:20
seems like we've learned a lesson
52:22
that we've learned more than once
52:24
before, which is that large language
52:26
models are trained on the internet,
52:28
which contains near infinite racism, So
52:30
there are mitigations that you can
52:32
take against that, but it appears
52:34
that in this case, they were
52:36
not successful. Fable's head of community,
52:39
Kim Marsh Alley, has said that
52:41
all features using AI are being
52:43
removed from the app, and a
52:45
new app version is being submitted
52:47
to the app store. So you
52:49
always hate it when the first
52:51
time you hear about an app
52:53
is that they added AI, and
52:55
it made it super racist, and
52:57
they have to redo the app.
52:59
this poses any sort of competitive threat
53:01
to Grock which until this story was
53:04
the leading racist AI app on the
53:06
market? I do think so and I
53:08
have to admit that all the folks
53:10
over at Grock are breathing a sigh
53:12
of relief now that they have once
53:14
again claimed the mantle. All right Casey
53:16
how hot is this mess? Well Kevin
53:18
in my opinion if your AI is
53:21
so bad that you have to remove
53:23
it from the app completely that's a
53:25
hot mess. Yeah, I rate this one
53:27
a hot mess as well. All right,
53:29
next stop. Amazon pauses drone
53:31
deliveries after aircraft
53:33
crashed in rain. Casey,
53:35
this story comes to us
53:37
from Bloomberg, which had a different
53:40
line of reporting than we did
53:42
just a few weeks ago on
53:44
the show about Amazon's drone program
53:47
Prime Air. Casey, what happened to
53:49
Amazon Prime Air? If you heard
53:51
the episode of Heart Fork where
53:54
we talked about it, Amazon Prime
53:56
Air delivered us some Brazilian bumbum
53:58
cream and it... did so without
54:00
incident. However, Bloomberg reports that Amazon has
54:03
had to now pause all of their
54:05
commercial drone deliveries after two of its
54:07
latest models crashed in rainy weather at
54:10
a testing facility. And so the company
54:12
says it is immediately suspending drone deliveries
54:14
in Texas and Arizona and will now
54:16
fix the aircraft software. Kevin, how did
54:19
you react to this? Well, I think
54:21
it's good that there's suspending drone deliveries
54:23
before they fix the software because these
54:26
things are quite heavy, Casey. I would
54:28
not want one of them to fall
54:30
in my head. And I have to
54:32
tell you this story gave me the
54:35
worst kind of flashbacks because in 2016
54:37
I wrote about Facebook's drone Aquila and
54:39
its first what the company told me
54:42
had been its first successful test flight
54:44
in its mission to deliver internet around
54:46
the world via drone What the company
54:48
did not tell me when I was
54:51
interviewing its executives including Mark Zuckerberg was
54:53
that the plane had crashed after that
54:55
first flight and so I was a
54:57
small detail I'm sure it was an
55:00
innocent omission Yes, I'm sure. Well, it
55:02
was Bloomberg again, who reported, you know,
55:04
a couple months after I wrote this
55:07
story, that the Facebook drone had crashed.
55:09
I was, of course, hugely embarrassed and,
55:11
you know, wrote a bunch of stories
55:13
about this. But anyways, it really should
55:16
have occurred to me when we were
55:18
out there watching the Amazon drone, that
55:20
this thing was also probably secretly crashing,
55:23
and we just hadn't found out about
55:25
it yet. And indeed, we now learned
55:27
it. We have to ask them, now,
55:29
do this thing actually crash? I'm tired
55:32
of being burned. Now Casey, we should
55:34
say, according to Bloomberg, these drones reportedly
55:36
crash in December. We visited Arizona to
55:39
see them in very early December, so
55:41
most likely, you know, this all happened
55:43
after we saw them. But I think
55:45
it's a good idea to keep in
55:48
mind that as we're talking about these
55:50
new and experimental technologies. that many of
55:52
them are still having the kinks worked
55:55
out. All right Kevin, so let's get
55:57
out the thermometer. or how hot of
55:59
a mess is this? I would say
56:01
this is a moderate mess. Look, these
56:04
are still testing programs. No one was
56:06
hurt during these tests. I am glad
56:08
that Bloomberg reported on this. I'm glad
56:11
that they've suspended the deliveries. These things
56:13
could be quite dangerous flying through the
56:15
air. I do think it's one of
56:17
a string of reported. incidents with these
56:20
drones. So I think they've got some
56:22
quality control work ahead of them and
56:24
I hope they do well on it
56:27
because I want these things to exist
56:29
in the world and be safe for
56:31
people around them. All right. I will
56:33
agree with you and say that this
56:36
is a warm mess and hopefully you
56:38
can get straightened out over there. Let's
56:40
see what else is coming down the
56:43
tracks. Fitbit has agreed to pay $12
56:45
million for not quickly reporting burn risk
56:47
with watches. Kevin, do you hear about
56:49
this? I did. This was the fitbit.
56:52
Devices were like literally burning people. Yes,
56:54
from 2018 to March of 2022, Fitbit
56:56
received at least a hundred and seventy
56:59
four reports globally of the lithium ion
57:01
battery in the Fitbit ionic watch overheating,
57:03
leading to a hundred and eighteen reported
57:05
injuries, including two cases of third degree
57:08
burns and four of second degree burns.
57:10
That comes from the New York Times
57:12
Deal Hassan. Kevin, I thought these things
57:15
were just supposed to burn calories. Well,
57:17
it's like I always say, exercising is
57:19
very dangerous and you should never do
57:21
it. And this justifies my decision not
57:24
to wear a fit bit. To me,
57:26
the biggest surprise of this story was
57:28
that people were wearing fit bits from
57:31
March 2018 to 2022. I thought every
57:33
fitbit had been purchased by like 2011
57:35
and then put in a drawer never
57:37
to be heard again. So what is
57:40
going on with these sort of late
57:42
stage fitbit buyers? I'd love to find
57:44
out. But of course, we feel terrible
57:47
for everyone who was burned by a
57:49
fitbit and it's not going to be
57:51
the last time technology burns you. I
57:53
mean realistically. That's true. That's true. Now
57:56
what kind of mess is this? I
57:58
would say this is a hot mess.
58:00
This is an officially hot, literally hot.
58:03
They're hot. Here's my sort of rubric.
58:05
If technology physically burns you, it is
58:07
a hot mess. If you have physical
58:09
burns on your body, what other kind
58:12
of mess could it be? It's true.
58:14
That's a hot mess. Okay, next stop
58:16
on the Hot Mess Express. Google says
58:19
it will change Gulf of Mexico to
58:21
Gulf of America in Maps app after
58:23
government updates. Casey, have you been following
58:25
this story? I have, Kevin, every morning
58:28
when I wake up I scan America's
58:30
maps and I say, what has been
58:32
changed? And if so, has it been
58:34
changed for political reasons? And this was
58:37
probably one of the biggest examples of
58:39
that we've seen. Yeah, so this was
58:41
an interesting story that came out in
58:44
the past couple of days. days in
58:46
office and said that he was changing
58:48
the name of the Gulf of Mexico
58:50
to the Gulf of America and the
58:53
name of Denali, the mountain in Alaska,
58:55
to Mount McKinley, Google had to decide,
58:57
well, when you go on Google Maps
59:00
and look for those places, what should
59:02
I call them? It seems to be
59:04
saying that it is going to take
59:06
inspiration from the Trump administration and update
59:09
the names of these places in the
59:11
maps app. Yeah, and look, I don't
59:13
think Google really had a choice here.
59:16
We know that the company has been
59:18
on Donald Trump's bad side for a
59:20
while, and if it had simply refused
59:22
to make these changes, it would have
59:25
sort of caused a whole new controversy
59:27
for them. And it is true that
59:29
the company changes place names when governments
59:32
changed place names, right? Like Google Maps
59:34
existed when Mount McKinley was called Mount
59:36
McKinley, and President Obama changed it to
59:38
Janali, and Google updated the map. Now
59:41
it's changed back there doing... the same
59:43
thing. But now that we know how
59:45
compliant Google is Kevin, I think there's
59:48
room for Donald Trump to have a
59:50
lot of fun with the company. Yeah,
59:52
what can you do? Well, you could
59:54
call it the Gulf of Gemini isn't
59:57
very good. And just see what would
59:59
happen. Because they would kind of have
1:00:01
to just change it. Can you imagine
1:00:04
every time you opened up? Google Maps
1:00:06
and you looked at the Gulf of
1:00:08
Mexico slash America and just said the
1:00:10
Gulf of Gemini is not very good.
1:00:13
You know I hate to give Donald
1:00:15
Trump any ideas but I don't know.
1:00:17
So what kind of mess do you
1:00:20
think this is Kevin? I think this
1:00:22
is a mild mast. I think this
1:00:24
is a tempest in a teapot. I
1:00:26
think that this is the kind of
1:00:29
update that you know companies make all
1:00:31
the time because places change names all
1:00:33
the time let's just say it well
1:00:36
Kevin I guess I would say that
1:00:38
one is a hot mess because if
1:00:40
we're just gonna start renaming everything on
1:00:42
the map that's just gonna get extremely
1:00:45
confusing for me to follow I got
1:00:47
places to go you go to like
1:00:49
three places yeah and I use Google
1:00:52
Maps to get there and I need
1:00:54
them to be named the same thing
1:00:56
that they were yesterday I don't think
1:00:58
they're gonna change the name of Barry's
1:01:01
boot camp all right final stop on
1:01:03
the hot mess express Casey bring us
1:01:05
home. All right. Kevin, this is some
1:01:08
sad news. Another Waymo was vandalized. This
1:01:10
is from one-time hard-for guest Andrew J.
1:01:12
Hawkins at The Virgin. He reports that
1:01:14
this Waymo was vandalized during an illegal
1:01:17
street takeover near the Beverly Center in
1:01:19
LA. Video from Fox 11 shows a
1:01:21
crowd of people basically dismantling the driverless
1:01:24
car piece by piece and then using
1:01:26
the broken pieces to smash the windows.
1:01:28
Kevin, what did you make of this?
1:01:30
Well, Casey, as you recall, you predicted
1:01:33
that in 2025, Waymo would go mainstream,
1:01:35
and I think there's no better proof
1:01:37
that that is true than that people
1:01:40
are turning on the Waymo's and starting
1:01:42
to beat them up. Yeah, I, you
1:01:44
know, look, I don't... know that we
1:01:46
have heard any interviews from why these
1:01:49
people were doing this. I don't know
1:01:51
if we should see this as like
1:01:53
a reaction against AI in general or
1:01:55
of Waymos specifically, but I always find
1:01:58
it like weird and sad when people
1:02:00
attack Waymos because they truly are safer
1:02:02
cars than free other car. Well, not
1:02:05
if you're going to be riding in
1:02:07
them and people just going to start
1:02:09
like beating the car, then they're not
1:02:11
safer. No, but you know, that's only
1:02:14
happened a couple times that we're aware
1:02:16
of. Right. Yeah. So yeah, this story
1:02:18
is sad to me. Obviously people are
1:02:21
reacting to Waymo's. Maybe they have sort
1:02:23
of fears about this technology or think
1:02:25
it's going to take jobs or maybe
1:02:27
they're just pissed off and they want
1:02:30
to break something. But don't hurt the
1:02:32
Waymo's people in part because they will
1:02:34
remember. They will remember. They will remember.
1:02:37
And they will come for you. I'm
1:02:39
not sure that that's true, but I
1:02:41
think we should also note that Waymo
1:02:43
only became officially available in LA in
1:02:46
November of last year. And so part
1:02:48
of this just might be a reaction
1:02:50
to the newness of it all and
1:02:53
people getting a little carried away, just
1:02:55
sort of curious, what will happen if
1:02:57
we try to destroy this thing? Will
1:02:59
it deploy defensive measures and so on?
1:03:02
So they're gonna have to put flame
1:03:04
throwers on them. I'm just calling it
1:03:06
right now. one was? I think this
1:03:09
one is a is a lukewarm mess
1:03:11
that has the potential to escalate. I
1:03:13
don't want this to happen. I sincerely
1:03:15
hope this does not happen, but I
1:03:18
can see as Waymo start, you know,
1:03:20
being rolled out across the country that
1:03:22
some people are just going to lose
1:03:25
their minds. Some people are going to
1:03:27
see this as the physical embodiment of
1:03:29
technology invading every corner of our lives
1:03:31
and they are just going to react
1:03:34
in strong and occasionally destructive ways. I'm
1:03:36
sure the Waymo has gamed this all
1:03:38
out. I'm sure that this does not
1:03:41
surprise them. I know that they have
1:03:43
been asked about what happens if Waymo's
1:03:45
start getting vandalized and they presumably have
1:03:47
plans to deal with that, including prosecuting
1:03:50
the people who are doing this. But
1:03:52
yeah, I always go out of my
1:03:54
way to try to be nice to
1:03:57
Waymo's and in fact. Some other Waymo
1:03:59
news this week, Jane Manchin Wong, the
1:04:01
security researcher, reported on X recently that
1:04:03
Waymo is introducing or at least testing
1:04:06
a tipping feature and so I'm gonna
1:04:08
start tipping my Waymo just to make
1:04:10
up for all the jerks in L.A.
1:04:13
who are vandalizing them. It looks like
1:04:15
the tipping feature by the way will
1:04:17
to be to tip a charity and
1:04:19
that Waymo will not keep that money.
1:04:22
At least that's what's when we're reporting.
1:04:24
No I think it's going to the
1:04:26
flame-through or fond. by
1:04:52
Rachel Cohn and Whitney Jones. We're
1:04:55
edited this week by Rachel Dry
1:04:57
and fact-checked by Ena Alvarado. Today's
1:05:00
show was engineered by Dan Powell.
1:05:02
Original music by Diane Wong and
1:05:04
Dan Powell. Our executive producer is
1:05:07
Jen Poyan. Our audience editor is
1:05:09
Melgalogli. Video production by Ryan Manning
1:05:11
and Chris Shot. You can watch
1:05:14
this whole episode on YouTube at
1:05:16
youtube.com/hard fork. Special thanks to Paula
1:05:18
Shuman. Puewing Tim, Dahlia Hidad, and
1:05:21
Jeffrey Miranda. You can email us
1:05:23
at Hard Fork at nytimes.com with
1:05:26
what you are calling the Gulf
1:05:28
of Mexico.
Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More