OpenAI’s Deep Research Team on Why Reinforcement Learning is the Future for AI Agents by Training Data | Podchaser

Episode from the podcastTraining Data

OpenAI’s Deep Research Team on Why Reinforcement Learning is the Future for AI Agents

Released Tuesday, 25th February 2025

Good episode? Give it some love!

OpenAI’s Deep Research Team on Why Reinforcement Learning is the Future for AI Agents

OpenAI’s Deep Research Team on Why Reinforcement Learning is the Future for AI Agents

Tuesday, 25th February 2025

Good episode? Give it some love!

Rate Episode

Podchaser Pro

Episode Transcript

Transcripts are displayed as originally observed. Some content, including advertisements may have changed.

Use Ctrl + F to search

0:00

a lesson that I've seen people learn over and

0:02

over again in this field is like, you know,

0:04

we think that we can do things that are

0:06

smarter than what the models do by writing it

0:08

ourselves, but as the field progresses, the models come

0:11

up with better solutions to things than humans do.

0:13

The like probably like number one lesson on machine

0:15

learning is like, you get what you optimize for.

0:17

And so if you're able to set up the

0:20

system such that you can optimize directly

0:22

for the outcome that you're looking for.

0:24

the results are going to be much,

0:26

much better than if you sort of

0:29

try to glue together models that are

0:31

not optimized end-to-end for the tasks they

0:33

are trying to have them do.

0:35

So my long-term guidance is that

0:37

I think like reinforcement learning tuning

0:39

on top of models is probably

0:41

going to be a critical part

0:43

of how the most powerful agents get

0:46

built. We're

1:02

excited to welcome Issa Fulford and Josh Tobin

1:04

who lead the Deep Research product at Open

1:06

AI. Deep Research launched three weeks ago

1:08

and has quickly become a hit

1:10

product used by many tech luminaries

1:12

like the Collisons for everything from

1:14

industry analysis to medical research to

1:16

birthday party planning. Deep Research was

1:18

trained using end-to-end reinforcement learning on hard

1:21

browsing and reasoning tasks and is the

1:23

second product in a series of agent

1:25

launches from Open AI with the first

1:28

being operator. We talked to ESA and

1:30

Josh about everything from Deep Research's

1:32

use cases to how the technology

1:34

works under the hood to what

1:36

we should expect in future agent

1:38

lunches from Open AI. ESA and Josh,

1:40

welcome to the show. Thank you. Thank you

1:42

so much for joining us. Excited to

1:44

be here. Thank you for having us.

1:46

So maybe let's start with what is

1:48

Deep Research? Tell us about the origin

1:50

stories and what this product is doing.

1:53

So Deep Research is a agent

1:55

that is able to... such

1:57

many online websites and it

1:59

can create very comprehensive reports. It

2:01

can do tasks that would take

2:03

humans many hours to complete and

2:05

it's in Chachibati and it takes

2:07

like five to 30 minutes to

2:09

answer you and so it's able

2:11

to do much more in-depth research

2:13

and answer your questions with much

2:15

more detail and specific sources than

2:17

regular Chachibati-PT response would be able to

2:19

do. It's one of the first agents

2:21

that we've released so we released operator

2:24

pretty recently as well and so deep research

2:26

is the second agent and you know

2:28

will release many more in. in future. What's

2:30

the origin story behind deep research?

2:33

Like when did you choose to

2:35

do this? What was the inspiration

2:37

and how many people work on

2:39

it? Like what did it take to

2:41

bring this to fruition? Good question. This

2:43

is before my time. So I

2:45

think maybe around a year ago

2:48

we were seeing a lot of

2:50

success internally with this new reasoning

2:52

paradigm and training models to think

2:54

before responding and we were

2:56

focusing a lot on math and science domains

2:58

but I think that the other thing that

3:01

this kind of new reasoning model

3:03

regime unlocks is the ability to

3:05

do longer horizon tasks that involve like

3:07

agentic kind of you know abilities and

3:09

so we thought you know a lot

3:12

of people do tasks that require a

3:14

lot of online research or a lot

3:16

of external context and that involves a

3:18

lot of reasoning and discriminating between sources

3:20

and you have to be quite creative

3:23

to do those kinds of things. And

3:25

I think we finally had models or

3:27

a way of training models that would

3:29

allow us to be able to tackle

3:31

some of those tasks. So we decided

3:34

to try and start training models

3:36

to do first browsing tasks. So

3:38

using the same methods that we

3:40

used to train reasoning models, but

3:42

on more real-world tasks. Was it

3:44

your idea? And Josh, how did

3:47

you get involved? At first, it

3:49

was like me and Josh Patel,

3:51

who, as that opening, he's working

3:53

on a similar project that will be

3:55

released at some point which we're very

3:57

excited about and we built an original

3:59

demo. And then also with Thomas Stimson,

4:01

who's one of those people who just,

4:04

is an amazing engineer, like,

4:06

will dive into anything and just,

4:08

you know, get loads of things on. So it

4:10

was very fun. Yeah, and I joined more

4:12

recently. I rejoined opening eye

4:14

about six months ago from my startup.

4:17

I was an eye opening eye in the

4:19

early days and was looking around the

4:21

projects when I rejoined and

4:23

got very interested in some of our

4:25

age and take efforts, including this

4:27

one and got involved with that.

4:30

Amazing. Well, tell us a

4:32

little about who you built it

4:34

for. Yeah, but it's really

4:36

for anyone who does knowledge

4:38

work as part of their

4:40

day-to-day job or really as part

4:43

of their life. So we're seeing

4:45

a lot of the usage come

4:47

from people using it for

4:49

work, doing things like research

4:51

as part of their jobs, for

4:54

understanding markets,

4:56

companies, real estate. a

4:58

lot of scientific research, medical, I

5:00

think we've seen a lot of

5:02

medical examples as well. And one of the

5:05

things we're really excited about as well is

5:07

that this this style of like I just

5:09

need to go out and spend many hours

5:11

doing something that you know where I have

5:13

to do a bunch of web searches and

5:15

collate a bunch of information is not just

5:18

a work thing but it's also useful for

5:20

shopping and travel as well. So we're excited for the

5:22

plus launch so that more people will be able to. try

5:24

deep research and maybe we'll see some new use cases as

5:26

well. It's definitely one of the products I've used the most

5:28

over the last couple weeks. It's been amazing. Using it for

5:31

work? For work, definitely. Also for fun. What are you using

5:33

it for? Oh for me? Oh for me? Oh my goodness.

5:35

I was thinking about buying it for. Oh for me? So

5:37

I was thinking about buying a new car. I'm sorry. So I was

5:39

thinking about... Oh for me? Oh my goodness. I was thinking about, oh

5:41

my goodness. I was thinking about buying a new. Oh my goodness. I was

5:43

thinking about buying a new. Oh my goodness. Oh my goodness. Oh my

5:45

goodness. Oh my goodness. I was thinking about a new. Oh my goodness.

5:48

Oh my goodness. Oh my goodness. I was thinking about. Oh my goodness.

5:50

Oh my goodness. I was thinking about. Oh my goodness. Oh my goodness.

5:52

I was thinking about. Oh my goodness. I was thinking about. Oh my

5:54

goodness. Oh my goodness. I It put together an amazing report

5:56

that told me maybe wait a couple

5:58

months, but this year. like in the next

6:01

few months it should come out. Yeah,

6:03

like one of the things that's really

6:05

cool about it is it's like, it's

6:07

not just for going broad and gathering

6:09

all of the information about a

6:11

source, but it's also really good at

6:13

finding like very obscure, like weird

6:15

facts on the internet. Like if

6:18

you have something very specific you

6:20

want to know that you might not just

6:22

turn up in the first page of search results,

6:24

it's good at that kind of thing

6:26

too. So that kind of thing. So that

6:29

kind of people. are using it for coding.

6:31

Yeah. Which wasn't really a use

6:33

case I'd considered, but I've seen

6:35

a lot of people on Twitter and

6:37

in various places where we get

6:40

feedback using it for coding and

6:42

code search and also for

6:44

finding the latest documentation on

6:47

a certain package or a certain

6:49

package or a certain package or

6:51

something and helping them write a

6:53

script or something. Yeah, I'm like

6:55

I'm kind of embarrassed that we

6:57

didn't think of that as a use

6:59

case. will evolve over time? Like you

7:01

mentioned the plus launch that's happening, you

7:04

know, in a year's time or two

7:06

years time. Would you guess this is

7:08

mostly a business tool or mostly a

7:10

consumer tool? I would say hopefully both.

7:12

I think it's a pretty general capability,

7:15

which and I think it's something that we

7:18

do both in work and in personal

7:20

life. So I'm excited about both.

7:22

I think the magic of it is like,

7:24

um, it just saves people a lot of

7:26

time. You know, if there's... something that

7:28

might have taken you hours or in

7:31

some cases we've heard like days. People

7:33

can just put it in here and

7:35

get you know 90% of what they would have

7:37

come out up with on their own. And so

7:39

yeah I tend to think there's like

7:41

there's more tasks like that in business

7:43

than there are in personal but I

7:45

mean I think for sure it's gonna

7:48

be part of people's lives in both.

7:50

It's really become the majority

7:52

of my usage for chat. I

7:54

just always picked deep research rather

7:56

than normal. So what are you seeing in terms of

7:58

consumer use cases? And what are you excited? about? I

8:00

think a lot of shopping, travel

8:02

recommendations. I personally

8:05

used the model a lot.

8:07

I've been using it for months to

8:09

do these kinds of things. We were

8:11

in Japan for the for the launch

8:13

of deep research so it was

8:15

very helpful in finding restaurants

8:18

and finding things that

8:20

I wouldn't have like

8:22

necessarily found. Yeah and I found

8:24

it like when you have something...

8:26

It's like the kind of thing

8:29

where, you know, if you're shopping,

8:31

maybe for something expensive or you're

8:33

planning a trip that is special

8:36

or you want to spend a lot of,

8:38

that you're, you want to spend

8:40

a lot of time thinking about.

8:42

It's like, for me, you know, I

8:44

might go and spend hours and hours

8:47

like trying to read everything on

8:49

the internet about this one,

8:51

this product that I'm interested

8:53

in buying, like, like something

8:55

like that. very quickly. And so

8:57

it's really useful for that kind of

9:00

thing. The model is also very

9:02

good at instruction following. So if you

9:04

have a query with many different parts

9:06

or many different questions, so if

9:09

you want the information about the

9:11

product, but you also want comparisons

9:13

to all other products, and you

9:15

also want information about reviews from,

9:17

you know, read it or something

9:19

like that. You can give loads

9:22

of different requirements and

9:24

it will do all of them for you. ask

9:26

it to format it in a table. It

9:28

will usually do that anyway, but it's really

9:30

helpful to have a table with a bunch

9:33

of citations and things like that for all

9:35

the categories of things that you want to

9:37

research. Yeah, there are also some features

9:39

that hopefully will get into the product

9:41

at some point, but the underlying model

9:43

is able to embed images so it

9:45

can find images of the products. And

9:48

it's also, this is not a consumer

9:50

use case, but it's able to create

9:52

graphs as well and then embed those

9:54

in its response. Hopefully that will come

9:56

to chat to you soon as well. nerdy

9:58

consumer use case. Yeah. And

10:00

speaking of nerdy consumer use

10:03

cases, also like personalized education

10:05

is a really interesting use

10:07

case. Like if there's a

10:10

topic that you've been meeting to

10:12

learn about, you know, if you need

10:14

to brush up on your biology or,

10:16

you know, you want to learn about

10:19

like, like, like, some world event, it's,

10:21

it's really good at, you know, put

10:23

in all the information about.

10:25

what you feel like you don't understand, what

10:27

aspects of it you want to go do

10:29

research on and it'll put together a nice

10:31

report for you. One of my friends is considering

10:34

starting a CPG company and he's

10:36

been using it. so much to

10:38

find similar products to see if

10:40

specific names are already, you know,

10:43

the domains are already taken, market

10:45

sizing, like all of these different

10:47

things. So that's been fun to,

10:49

he'll share the reports with me

10:51

and I'll read them. So it's pretty

10:53

fun use case is it's really good

10:55

at finding like a single obscure

10:58

fact on the internet. Like if

11:00

there's like a, you know, like an

11:02

obscure TV show or something that you

11:04

want to... you know, to like find

11:07

like one particular episode of or

11:09

something like that, it'll go and

11:11

it'll go deep and find the

11:14

like one reference to it on the

11:16

web. Oh yeah, my my brother's

11:18

friend's dad had this very specific

11:20

fact. It was about some

11:22

Austrian general who was empowered during a

11:25

certain a death of someone during a

11:27

battle like a very niche question and

11:29

Apparently Chad GBT had previously answered it

11:31

wrong and he was very sure that

11:33

it was wrong So you went to

11:35

the public library and found a record and

11:38

found that it was wrong and so then

11:40

Deep research was able to get it right

11:42

so we sent it to him and he

11:44

was he was excited What is the rough

11:46

mental model for you know what deep research

11:48

is excellent at today and you know where

11:50

should people be using? the O series of

11:53

models, where should they be using

11:55

deep research? What deep research

11:57

really excels at is if you

11:59

have a... sort of detailed description of

12:01

what you want and in order to

12:03

get the best possible answer requires reading

12:06

a lot of the internet. If

12:08

you have kind of like more of a

12:10

vague question it'll help you kind of

12:12

clarify what you want but it's I

12:14

mean it's it's really at its best

12:16

when there's like a specific set of

12:18

information that you're looking for. And I

12:21

think it's very good at synthesizing

12:23

information it encounters. It's very

12:25

good at finding specific like hard

12:27

to find information. but it's

12:29

maybe less and it can make kind

12:32

of some new insights I guess

12:34

from what it from what

12:36

it encounters but I don't

12:38

think it's not making new

12:40

scientific discoveries yet and then I

12:42

think using the O-series model for

12:45

me if I'm asking for something

12:47

to do with coding usually it

12:49

doesn't require knowledge outside of

12:52

what the model already knows from

12:54

it like pre-training so I

12:56

would usually use O1Pro or O1

12:59

for coding, or O3 Mini, hi. And

13:01

so deep research is a great

13:03

example of where some of the

13:05

new product directions for open AI

13:07

are going. I'm curious, how can

13:09

the extent you can share, how does

13:11

it work? The model that powers deep

13:14

research is a fine-tuned version of

13:16

O3, which is our most advanced

13:19

reasoning model, and we specifically

13:21

trained it on hard browsing tasks

13:23

that we collected, as well as

13:26

other reasoning tasks. And so it

13:28

also has access to a browsing

13:30

tool and Python tool. So through

13:32

training, end to end on those

13:35

tasks, it learned like strategies to

13:37

solve them. And the resulting models

13:39

good at online search and analysis.

13:42

Yeah, like intuitively, the

13:44

way you can think about it is you

13:46

make this sort of this request,

13:48

ideally a detailed request about what

13:50

you want. The model thinks

13:53

hard about that. It searches for

13:55

information. It pulls that information and it

13:57

reads it and understands how it

13:59

relates to it. that request and then decides

14:01

what to search for next in order

14:03

to get kind of closer to the

14:06

final answer that you want. And it's

14:08

trained to do a good job of

14:10

pulling together all those all that information

14:13

to a nice tidy report with citations

14:15

that point back to the original information

14:17

that I found. Yeah I think what's

14:20

new about deep research as an agentic

14:22

capability is that because we have the

14:24

ability to train end to end there

14:27

are a lot of things that that

14:29

you have to do in the process

14:31

of doing research that you couldn't really

14:34

predict beforehand. So I don't think it's

14:36

possible to write some kind of language

14:38

model program or script that would be

14:41

as flexible as what the model is

14:43

able to learn through training where it's

14:45

actually reacted to live web information. And

14:48

based on something it sees, it has

14:50

to change its strategy and things like

14:52

that. So we actually see it doing

14:55

pretty creative searches. You can read the

14:57

chain of thought summary and I'm sure

14:59

you can see sometimes it's very very

15:02

smart about how it comes up with

15:04

the next thing to look for. So

15:06

John Carlson had a tweet that went

15:09

somewhat viral. You know how much of

15:11

the magic of deep research is real-time

15:13

access to web content and how much

15:15

of the magic is in kind of

15:18

chain of thought? Can you maybe shed

15:20

some light on that? I think it's

15:22

definitely a combination. I think you can

15:25

see that because there are other such

15:27

products that don't necessarily, that weren't trained

15:29

end to end, so won't be as

15:32

flexible in responding to, you're responding to

15:34

information in accounters, won't be as creative

15:36

about how to solve specific problems because

15:39

they weren't specifically trained for that purpose.

15:41

So it's definitely a combination. I mean,

15:43

it's a fine team version of O3.

15:46

O3 is a very smart and powerful

15:48

model. A lot of the analysis capability.

15:50

is also from the underlying 03 model

15:53

training. But so I think it's definitely

15:55

a combination. Before. Open AI was working

15:57

at a startup and we were dabbling

16:00

in building agents kind of the way

16:02

that I see most people describe building

16:04

agents on the internet, which is essentially,

16:07

you know, you construct this graph of

16:09

operations and some of the nodes in

16:11

that graph are language models. And so

16:14

you can, the language model can decide

16:16

what to do next, but the overarching

16:18

logic of the sequence of steps that

16:21

happen is defined by a human. What

16:23

we found is that it's really, it's

16:25

like a powerful way of building things

16:28

to get quickly to a prototype, but

16:30

it falls down pretty quickly in the

16:32

real world because it's very hard to

16:35

anticipate all the scenarios that the model

16:37

might face and think about all the

16:39

different branches of the path that you

16:41

might want to take. In addition to

16:44

that, the models often are not the

16:46

best decision makers at nodes in that

16:48

graph because they weren't trained to do

16:51

to make those decisions. They were trained

16:53

to do things that look similar to

16:55

that look similar to that. And so

16:58

I think the thing that's really powerful

17:00

about this model is that it's trained

17:02

directly end to end to solve the

17:05

kinds of tasks that users are using

17:07

it to solve. So you don't have

17:09

to set up a graph or make

17:12

those node-like decisions on the architecture on

17:14

the back end? It's all driven by

17:16

the model itself. Yeah. Can you say

17:19

more about this? You know, it seems

17:21

like that's one of the very opinionated

17:23

decisions that you've made and clearly it's

17:26

worked. There's so many companies that are

17:28

building on your API, kind of prompting

17:30

to, you know, to, you know, solve

17:33

specific tasks for specific users. Do you

17:35

think a lot of those applications would

17:37

be better served by kind of having,

17:40

you know, trained models end-to-end for their

17:42

specific workflows? I think if you have

17:44

a very specific workflow that is quite

17:47

predictable. it makes a lot of sense

17:49

to do something like Josh described, but

17:51

if you have something that has a

17:54

lot of edge cases or it needs

17:56

to be quite flexible, then I think

17:58

something similar to Deep Research is probably

18:01

a better approach. Yeah, I think like

18:03

the guidance I give people is the

18:05

one thing that you don't want to

18:07

bake into the model is like kind

18:10

of hard and fast rules. Like if

18:12

you have, you know, a database that

18:14

you don't want the model to touch

18:17

or something like that, it's better to

18:19

encode that in human written logic, but

18:21

I think it's kind of like a

18:24

lesson that I've seen people learn over

18:26

and over again in this field is

18:28

like, you know, we think that we

18:31

can do things that are smarter than

18:33

what the models do by writing it

18:35

ourselves. But in reality, like usually as

18:38

the field progresses, the models come up

18:40

with better solutions to things than humans

18:42

do. And also like, you know, the

18:45

like probably like number one lesson on

18:47

machine learning is like you get what

18:49

you optimize for. And so if you're

18:52

able to set up the system such

18:54

that you can optimize directly for the

18:56

outcome that you're looking for. the results

18:59

are going to be much, much better

19:01

than if you sort of try to

19:03

glue together models that are not optimized

19:06

end-to-end for the tasks they are trying

19:08

to have them do. So my long-term

19:10

guidance is that I think like reinforcement

19:13

learning tuning on top of models is

19:15

probably going to be a critical part

19:17

of how the most powerful agents get

19:20

built. What were the biggest technical challenges

19:22

along the way to making this work?

19:24

Well, I mean, maybe I can say

19:26

as like an observer rather than someone

19:29

who was involved in this from the

19:31

beginning, but it seems like kind of

19:33

one of the things that ESA and

19:36

the rest of the team worked really,

19:38

really hard on and was kind of

19:40

like one of the hidden keys to

19:43

success was like making really high quality

19:45

data sets. It's another one of those

19:47

like age old lessons in machine learning

19:50

that people keep re learning, but the

19:52

quality of the data that you put

19:54

into the model is probably the biggest

19:57

determining factor in the quality of the

19:59

model that you get on the other

20:01

side, who's other person who works on

20:04

the project, who just, any data set,

20:06

who will optimize, so that's... secret to

20:08

success. Find your Edward. Great, great, machine

20:11

learning, model training. How do you make

20:13

sure that it's right? Yeah, so that's

20:15

obviously a cool part of this model

20:18

and product is that we want it

20:20

to be users to be able to

20:22

trust the outputs. So part of that

20:25

is we have citations and so users

20:27

are able to see where the model

20:29

is. citing its information from. And we,

20:32

during training, that's something that we actually

20:34

try and make sure is correct, but

20:36

it's still possible for the model to

20:39

make mistakes or hallucinate or trust a

20:41

source that maybe isn't the most trustworthy

20:43

source of information. So that's definitely an

20:46

active area where we want to continue

20:48

improving the model. How should we think

20:50

about this together with, you know, O3

20:52

and operator and other different releases? Like,

20:55

does this use operator? Do these all

20:57

build on top of each other or

20:59

are they all kind of a series

21:02

of different applications of O3? Today, these

21:04

are pretty disconnected. But you can kind

21:06

of, you can imagine kind of where

21:09

we're going with this, right, which is

21:11

like, the ultimate agent that people have

21:13

access to. at some point in the

21:16

future should be able to do, you

21:18

know, not just web search or using

21:20

a computer or any of the other

21:23

types of actions that you'd want, like

21:25

kind of a human assistant to do,

21:27

but should be able to fuse all

21:30

these things in a more natural way.

21:32

Any other design decisions that, you know,

21:34

you've taken that maybe not obvious at

21:37

first glance? I think one of them

21:39

is the clarification flow. So if you've

21:41

used deep research, the model will ask

21:44

you questions before science research. And usually

21:46

ChatGBT, maybe I'll ask you a question

21:48

at the end of its response, but

21:51

it usually doesn't have such a, that

21:53

kind of behavior up front. And that

21:55

was intentional because you will get the

21:58

best response from the deep research model

22:00

if... the prompt is very well specified

22:02

and detailed. And I think that it's

22:05

not the natural user behavior to give

22:07

all of the information in the first

22:09

prompt. So we wanted to make sure

22:12

that if you're going to wait five

22:14

minutes, 30 minutes, that your response is

22:16

as detailed and you satisfactory. So. we

22:18

added this additional step to make sure

22:21

that the user provides all the detail

22:23

that we would need. And I've actually

22:25

seen a bunch of people on Twitter

22:28

saying that they have this flow or

22:30

that they will talk to 01 or

22:32

01 Pro to help make their prompt

22:35

more detailed. And then once they're happy

22:37

with the prompt, then they'll send it

22:39

to deep research, which is interesting. So

22:42

people are finding their own workloads for

22:44

how to use this. So

22:47

there's been three different deep research products

22:49

launched in the last few months Tell

22:51

us a little about what makes you

22:53

guys special and how we should think

22:56

about it And they're all called deep

22:58

research, right? They're all called deep research.

23:00

Yeah, not a lot of naming creativity

23:02

in this field I think I think

23:05

people should should trial them for themselves

23:07

and get a feel. I think I

23:09

think the difference in like quality I

23:12

think they all have pros and cons,

23:14

but I think the difference will be

23:16

clear But what that comes down to

23:18

is just the way that this model

23:21

was built. And the sort of the

23:23

effort that went into constructing the data

23:25

sets and then the the engine that

23:27

we have with the O-series models, which

23:30

allows us to just optimize models to

23:32

make things that are like really smart

23:34

and really high quality. We had the

23:37

O-1 team on the podcast last year

23:39

and we were joking that O-Net is

23:41

not that good at naming things. I

23:43

will say this is your best-named product.

23:46

Deep researches. At least it describes what

23:48

it does, I guess. Yeah. So I'm

23:50

curious to hear a little about where

23:52

you want to go from here. You

23:55

have deep research today, what do you

23:57

think it looks like a year from

23:59

now, and what maybe our complementary things

24:02

you want to build along the way?

24:04

Well, excited. to expand the data sources

24:06

that the model has access to. We've

24:08

trained the model that's generally very good

24:11

at browsing public information, but it should

24:13

also be able to search private data

24:15

as well. And then I think just

24:18

pushing the capabilities further, so it could

24:20

be better at browsing, it could be

24:22

better at analysis. And then thinking about

24:24

how this fits into our agent roadmap

24:27

more broadly. Like I think the recipe

24:29

here is something that's going to scale

24:31

to a pretty wide range of use

24:33

cases, things that are going to surprise

24:36

people how well they work. But this

24:38

idea of you take a state-of-the-art reasoning

24:40

model, you give it access to the

24:43

same tools that humans can use to

24:45

do their jobs or to go about

24:47

their daily lives, and then you optimize

24:49

directly for the kinds of outcomes that

24:52

you're looking that you want the agent

24:54

to be able to do. That recipe,

24:56

there's like really nothing stopping that recipe

24:58

from scaling to more and more complex

25:01

tasks. So I feel like, yeah, AGI

25:03

is like an operational problem now. And

25:05

I think, yeah, a lot of things

25:08

to come in that general formula. So

25:10

Sam had a pretty striking quote of

25:12

deep research will kind of take over

25:14

a single dinner percentage of all economically

25:17

viable tasks in the world. How should

25:19

we think about that? Deep Research is

25:21

not capable of doing all of what

25:24

you do, but it is capable of

25:26

saving you like hours or sometimes, in

25:28

some cases, days at a time. And

25:30

so I think like, what we're hopefully

25:33

relatively close to is deep research and

25:35

the agents that we build on top

25:37

of it, giving you, you know, one,

25:39

five, ten, 25% of your time back,

25:42

depending on the type of work that

25:44

you do. I mean, I think you

25:46

really are. made 80% of what I

25:49

do. So it's definitely on the higher

25:51

end for me. We just need to

25:53

start writing checks, I guess. Yeah. Are

25:55

there entire job categories that you think

25:58

are kind of more at risk is

26:00

the wrong word, but like more in

26:02

the in the strike zone for what

26:04

deep research is exceptional? So for example,

26:07

I'm thinking consulting, but like are there

26:09

specific categories that you think are more

26:11

in strike zone? Yeah, I used to

26:14

be consulting. I don't think any jobs

26:16

are at risk. at all. Like it's,

26:18

but for these types of knowledge work

26:20

jobs where like where you are spending

26:23

a lot of your time kind of

26:25

looking through information making conclusions, I think

26:27

it's going to give people superpowers. Yeah,

26:29

I'm very excited about a lot of

26:32

the medical use cases, just the ability

26:34

to find all of the literature or

26:36

all of the recent cases for a

26:39

certain condition. I think I've already seen

26:41

a lot of. doctors posting about this

26:43

or like they've reached out to us

26:45

and said oh we used it for

26:48

this thing we used it to help

26:50

find a clinical trial for this patient

26:52

or something like that so just people

26:55

who are already so busy just saving

26:57

some time or it's maybe something that

26:59

they wouldn't have had time to do

27:01

so and now they they are able

27:04

to have that information for them. Yeah

27:06

and I think the like the impact

27:08

of that is like maybe a little

27:10

bit more profound than it sounds on

27:13

the surface right it's not just like

27:15

you know getting 5% of your time

27:17

back but it's the type of thing

27:20

that might have taken you four hours

27:22

or eight hours to do, now you

27:24

can do for, you know, a chat

27:26

TV subscription and five minutes. And so,

27:29

like, what types of things would you

27:31

do if you had infinite time that

27:33

now maybe you can do, like, many,

27:35

many copies of? So, like, you know,

27:38

should you do research on every single

27:40

possible startup that you could invest in

27:42

instead of just the ones that you

27:45

have time to meet with, things like

27:47

that? Or on the consumer side, one

27:49

thing that I'm thinking of is, you

27:51

know, the working mom that's too busy

27:54

to plan a birthday party for her

27:56

toddler. Like, now it's, now it's too.

27:58

So it's I agree with you, it's

28:01

way more important than 5% of your

28:03

time. It's all the things you couldn't

28:05

do before. Exactly. What does this change

28:07

about education and the way we should

28:10

learn? And you know, what will you

28:12

be teaching your kids now that we're

28:14

in the world of agents in deep

28:16

research? Education's been like one of the

28:19

top few things that people use it

28:21

for. I think it's I mean this

28:23

is true for a chat tribute to

28:26

you generally. It's it's like a like

28:28

a like like learning things by talking

28:30

to an AI system that is able

28:32

to like personalize the information that gives

28:35

you based on what you tell it

28:37

or maybe in the future what it

28:39

knows about you. It feels like a

28:41

much more efficient way to learn and

28:44

a much more engaging way to learn

28:46

than like reading textbooks. We have some

28:48

lightning round questions. All right? Okay, your

28:51

favorite deep research use case. I'll say

28:53

yeah, like personalized education, just like learning

28:55

about anything I want to learn about.

28:57

I've already mentioned this, but I think.

29:00

a lot of the personal stories that

29:02

people have shared about finding information about

29:04

a diagnosis that they've received or someone

29:07

in their family received have been really

29:09

great to see. Okay, we saw a

29:11

few application categories breakout last year. So

29:13

for example, coding being an obvious one.

29:16

What application categories do you think will

29:18

break out this year? I mean, clearly

29:20

agents. Agents. I was going to say

29:22

too. I think it's like it's so

29:25

hard to keep up with the state

29:27

of the art in AI. I think

29:29

that you should recommend people reading to

29:32

read to learn more about agents or

29:34

where the state of AI is going.

29:36

Could be an author too. Training data.

29:38

Yeah, this fun cost. I think it's

29:41

like it's so hard to keep up

29:43

with the state of the art in

29:45

AI. I think the like the general

29:47

advice I have for people is like.

29:50

pick one or two subtopics that you're

29:52

really interested in and go like curate

29:54

a list of people who are we

29:57

think saying interesting things about it and

29:59

like how to find those one or

30:01

two things they were interested in. Maybe

30:03

actually that's a good deep research use

30:06

case. Like, you know, go, go, go,

30:08

go, use it to, like, go deep

30:10

on things that you want to learn

30:12

more about. This is a bit old

30:15

now, but I think a few years

30:17

ago I watched the, I think that

30:19

it was a, good introductions for reinforcement

30:22

learning so yeah would definitely second any

30:24

any content by Peter appeal my grad

30:26

school advisor yeah oh yeah okay reinforcement

30:28

learning is it you know it kind

30:31

of went through a peak and then

30:33

felt like it was in a little

30:35

bit of a dulled room again and

30:38

is speaking again is that the right

30:40

read on what's happening with our L

30:42

it's so back yeah why why now

30:44

because everything else is working mr. like

30:47

I think if you Maybe people who've

30:49

been following the field for a while

30:51

will remember the gallon liqueun cake analogy.

30:53

If you're building a cake, then most

30:56

of the cake is the cake. And

30:58

then there's a little bit of frosting

31:00

and then there's a few cherries on

31:03

top. And the analogy was that unsupervised

31:05

learning is the frosting and reinforcement learning

31:07

is the cherries on top. When we

31:09

in the field were working on reinforcement

31:12

learning back in, you know, 2015, 2016,

31:14

it's kind of like... I think Jan

31:16

Lekoon's analogy, which I think in retrospect

31:18

is probably correct, is that we were

31:21

like trying to add the cherries before

31:23

we had the cake. But now we

31:25

have language models that are pre-trained on

31:28

massive amounts of data and are incredibly

31:30

capable. We know how to, how to,

31:32

you know, do supervised fine tuning on

31:34

those language models to make them good

31:37

at instruction following and like generally doing

31:39

the things that people want them to

31:41

do. And so now that that works

31:44

really well, it's like very ripe to

31:46

tune those models for. Any kind of

31:48

use case that you can define a

31:50

reward function for great. Okay. So from

31:53

this lightning round we got agency you

31:55

know, the breakout

31:57

category in 2025 and reinforcement

31:59

learning is so

32:02

back. it. Thank you I

32:04

love it. joining us. We love you

32:06

guys so much for joining us. We

32:08

love this conversation. which is an incredible product and we can't

32:10

and we can't wait to see what

32:12

comes of it. you. you. Thank you. Thank you.

Rate

Get this podcast via API

From The Podcast

Training Data

Join us as we train our neural nets on the theme of the century: AI. Sonya Huang, Pat Grady and more Sequoia Capital partners host conversations with leading AI builders and researchers to ask critical questions and develop a deeper understanding of the evolving technologies—and their implications for technology, business and society.The content of this podcast does not constitute investment advice, an offer to provide investment advisory services, or an offer to sell or solicitation of an offer to buy an interest in any investment fund.

Join Podchaser to...

Rate podcasts and episodes
Follow podcasts and creators
Create podcast and episode lists
& much more

Download Audio Filehttps://tracking.swap.fm/track/AoSMVIl7piQbthzJoBqJ/traffic.megaphone.fm/CPUAI7097750063.mp3?updated=1740442579

Episode Tags

Do you host or manage this podcast?
Claim and edit this page to your liking.

,

Unlock more with Podchaser Pro

Audience Insights

Contact Information

Demographics

Charts

Sponsor History

and More!

Pro Features

Resources
Help Center
Blog
API

Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More