The Robotics Revolution, with Physical Intelligence’s Cofounder Chelsea Finn

The Robotics Revolution, with Physical Intelligence’s Cofounder Chelsea Finn

Released Thursday, 20th March 2025
Good episode? Give it some love!
The Robotics Revolution, with Physical Intelligence’s Cofounder Chelsea Finn

The Robotics Revolution, with Physical Intelligence’s Cofounder Chelsea Finn

The Robotics Revolution, with Physical Intelligence’s Cofounder Chelsea Finn

The Robotics Revolution, with Physical Intelligence’s Cofounder Chelsea Finn

Thursday, 20th March 2025
Good episode? Give it some love!
Rate Episode

Episode Transcript

Transcripts are displayed as originally observed. Some content, including advertisements may have changed.

Use Ctrl + F to search

0:00

Hi listeners welcome to No Priors. This week we're

0:02

speaking to Chelsea Finn, co-founder of Physical

0:04

Intelligence, a company bringing General Purpose AI

0:06

into the Physical World. Chelsea co-founded Physical

0:08

Intelligence, a company bringing General Purpose AI

0:11

into the physical world. Chelsea co-founded Physical

0:13

Intelligence alongside a team of leading researchers

0:15

and minds on the field. She's an

0:17

associate professor of computer science and electrical

0:19

engineering at Stanford University, and prior to

0:21

that she looked at Google Brain and

0:24

was at Berkeley. Chelsea's research focuses

0:26

on how AI systems can acquire general

0:28

purpose skills through interactions with the world.

0:30

So Chelsea, thank you so much for

0:32

joining us today. I know priors. Yeah,

0:34

thanks for having me. You've done a

0:36

lot of really important storied work in

0:39

robotics between your work and robotics between

0:41

your work and Google, at Stanford, etc.

0:43

So I would just love to hear

0:45

a little bit firsthand your background in

0:48

terms of your path in the world of

0:50

robotics, what drew you to it initially and

0:52

some of the work that you've. in the

0:54

world, but at the same time

0:56

I was also really fascinated by

0:58

this problem of developing perception and

1:00

intelligence and machines and robots embody

1:02

all of that. And also there's

1:04

sometimes there's some cool math that

1:06

you can do as well that

1:08

makes keep your brain active, makes

1:10

you think. And so I think

1:12

all of that is really fun

1:14

about working. in the field. I

1:16

started working more seriously in robotics

1:18

more than 10 years ago at

1:20

this point at the start of

1:22

my PhD at Berkeley and we

1:24

were working on neural network control

1:27

trying to train neural networks that

1:29

map from image pixels to directly

1:31

actually to motor torques on a

1:33

robot arm at the time. this

1:35

was not very popular and we've

1:37

come a long way and it's

1:39

a lot more accepted in robotics

1:41

and also just generally something that

1:43

a lot of people are excited

1:45

about. Since that beginning point it

1:47

was very clear to me that

1:50

we could train robots to do

1:52

pretty cool things but that getting the

1:54

robot to do one of those things

1:56

in many scenarios with many objects was

1:58

a major major challenge. So 10 years

2:00

ago we were training robots to like

2:02

screw a cap onto a bottle and

2:05

use a spatula to lift an object

2:07

into a bowl and kind of do

2:09

a tight insertion or hang up like

2:11

a hanger on a clothes rack. And

2:13

so pretty cool stuff, but actually getting

2:15

the robot to do that in many

2:18

environments with many objects, that's where a

2:20

big part of the challenge comes in

2:22

and I've been thinking about ways to

2:24

make. broader data sets, train on those

2:26

broader data sets, and also different approaches

2:28

for learning, whether it be reinforcement learning,

2:31

video prediction learning, all those things. And

2:33

so, yeah, move from, so into your

2:35

at Google Brain, in between my PhD

2:37

and joining Stanford, became a professor at

2:39

Stanford, started a lab there, did a

2:41

lot of work along all these lines.

2:44

and then recently started physical intelligence almost

2:46

a year ago at this point. So

2:48

I've been on leave from Stanford for

2:50

that and it's been really exciting to

2:52

be able to try to execute on

2:54

the vision that the co-founders that we

2:57

collectively have and do it with a

2:59

lot of resources and so forth and

3:01

I'm also still advising students at Stanford

3:03

as well. That's really cool. And I

3:05

guess we started physical intelligence with four

3:07

other co-founders and an incredibly impressive team.

3:09

Could you tell us a little bit

3:12

more about what physical intelligence is working

3:14

on in the approach that you're taking?

3:16

Because I think it's a pretty unique

3:18

slant on the whole field and approach.

3:20

Yeah, so we're trying to build a

3:22

big neural network model that could ultimately

3:25

control any robot to do anything in

3:27

any scenario. And like a big part

3:29

of our vision is that in the

3:31

past robotics is focused on like trying

3:33

to go deep on one application and

3:35

like developing a robot to do one

3:38

thing and then ultimately gotten kind of

3:40

stuck in that one application it's really

3:42

hard to like solve one thing and

3:44

then try to get out of that

3:46

and broaden and instead we're really in

3:48

it for the the long term to

3:51

try to address this broader problem of

3:53

physical intelligence in the real world. We're

3:55

thinking a lot about generalization generalists and

3:57

unlike other robotics companies we think that

3:59

being able to leverage all of the

4:01

possible data is very important. And this

4:04

comes down to actually not just leveraging

4:06

data from one robot, but from any

4:08

robot platform that might have six joints

4:10

or seven joints or two arms or

4:12

one arm. We've seen a lot of

4:14

evidence that you could actually transfer a

4:17

lot of rich information across these different

4:19

embodiments and allows you to use data.

4:21

And also if you iterate on your

4:23

robot platform, you don't have to throw

4:25

all your data away. I have faced

4:27

a lot of pain in the past

4:30

where we got a new version of

4:32

the new version of the robot. It's

4:34

a really painful process to try to

4:36

get back to where you were on

4:38

the previous robot iteration. So yeah, trying

4:40

to build General's robots and essentially kind

4:42

of develop foundation models that will power

4:45

the next generation of robots in the

4:47

real world. That's really cool, because I

4:49

mean, I guess there's a lot of

4:51

sort of parallels. to the large language

4:53

model world where you know really a

4:55

mixture of deep learning the transformer architecture

4:58

in scale has really proven out that

5:00

you can get real generalizability in different

5:02

forms of transfer between different areas. Could

5:04

you tell us a little bit more

5:06

about the architecture you're taking or the

5:08

approach or you know how you're thinking

5:11

about the basis for the foundation model

5:13

that you're developing? At the beginning we

5:15

were just getting off the ground we're

5:17

trying to scale data collection and a

5:19

big part of that is Unlike in

5:21

language, we don't have Wikipedia or an

5:24

internet of robot motions, and we're really

5:26

excited about scaling data on real robots

5:28

in the real world. This is, this

5:30

kind of real data is what has

5:32

fueled machine learning advances in the past,

5:34

and a big part of that is

5:37

we actually need to collect that data,

5:39

and that looks like teleoperating robots in

5:41

the physical world. We're also exploring other

5:43

ways of scaling data as well, but

5:45

the kind of bread and butter is

5:47

scaling real robot data. We released something

5:50

in late October where we showed some

5:52

of our initial efforts around. scaling data

5:54

and how we can learn very complex

5:56

tasks of folding laundry, cleaning tables, constructing

5:58

a cardboard box. Now where we are

6:00

in that journey is really thinking a

6:02

lot about language interaction and generalization to

6:05

different environments. So what we showed in

6:07

October was the robot in one environment

6:09

and it had data in that environment.

6:11

We did we were able to see

6:13

some. amount of generalization, so it was

6:15

able to fold shorts that had never

6:18

seen before, fold shorts that has never

6:20

seen before, but the degree of generalization

6:22

was very limited, and you also couldn't

6:24

interact with it in any way. You

6:26

couldn't prompt it and tell you what

6:28

you want to do beyond kind of

6:31

fairly basic things that it saw in

6:33

the training data. And so being able

6:35

to handle lots of different prompts in

6:37

lots of different environments is a big

6:39

focus right now. And in terms of

6:41

the architecture... We're using Transformers and we

6:44

are using pre-trained models, pre-trained vision language

6:46

models and... that allows you to leverage

6:48

all of the rich information on the

6:50

internet. We had a research result a

6:52

couple years ago where we showed that

6:54

if you leverage vision language models, then

6:57

you could actually get the robot to

6:59

do tasks that require concepts that were

7:01

never in the robots training data, but

7:03

we're in the internet. Like one famous

7:05

example is that you can pass the

7:07

Coke can to Taylor Swift and the

7:10

robot has never seen Taylor Swift in

7:12

person, but the internet has lots of

7:14

images of Taylor Swift in it. And

7:16

you can leverage all of the information

7:18

of the information on the pre-train kind

7:20

of transfer that to the robots. We're

7:22

not starting from scratch and that helps

7:25

a lot as well. So that's a

7:27

little bit about the approach. Happy to

7:29

dive deeper as well. That's really amazing.

7:31

And then, um, what do you think

7:33

is the main basis then for really

7:35

getting to generalizability? Is it scaling data

7:38

further? Is it scaling? And then, um,

7:40

what do you think is the main

7:42

basis then for really getting to generalizability,

7:44

like as you think through the common

7:46

pieces that people are spending. a lot

7:48

of time on reasoning modules and other

7:51

things like that as well. So I'm

7:53

curious, like, what are the components that

7:55

you feel are missing right now? Yeah,

7:57

so I think the number one thing,

7:59

and this kind of the boring thing,

8:01

is just getting more diverse robot data.

8:04

So for that release that we had

8:06

in late October, last year, we collected

8:08

data in three buildings, technically, the internet,

8:10

for example, and everything that is fueled

8:12

language models and vision models. is way

8:14

way more diverse than that because the

8:17

internet is pictures that are taken by

8:19

lots of people and text written by

8:21

lots of different people. And so just

8:23

trying to collect data in many more

8:25

diverse places and with many more objects,

8:27

many more tasks. So scaling the diversity

8:30

of the data, not just the quantity

8:32

of the data, is very important and

8:34

that's a big thing that we're focusing

8:36

on right now, actually bringing our robots

8:38

into lots of different places and collecting

8:40

data in it. As a side product

8:43

of that we also learn. what it

8:45

takes to actually get your robot to

8:47

be operational and functional in lots of

8:49

different places. And that is a really

8:51

nice byproduct because if you actually want

8:53

to get robots to work in the

8:55

real world, you need to be able

8:58

to do that. So that's the number

9:00

one thing, but then we're also exploring

9:02

other things, leveraging videos of people, again,

9:04

leveraging data from the web, leveraging pre-trained

9:06

models, thinking about... reasoning, although more basic

9:08

forms of reasoning, in order to, for

9:11

example, put a dirty shirt into a

9:13

hamper, if you can recognize where the

9:15

shirt is and where the hamper is

9:17

and what you need to do to

9:19

accomplish that task, that's useful, or if

9:21

you want to make a sandwich, and

9:24

the user has a particular request in

9:26

mind, you should reason through that request

9:28

if they're allergic to pickles, you probably

9:30

shouldn't put pickles on the sandwich. things

9:32

like that. So there's some basic things

9:34

around there, although the number one thing

9:37

is just more diverse for robot data.

9:39

And then I think a lot of

9:41

the pursuit of taking the data has

9:43

really been an emphasis on releasing open

9:45

source models and packages for robotics. Do

9:47

you think that's the long-term path? Do

9:50

you think it's open core? Do you

9:52

think it's eventually proprietary models? Or how

9:54

do you think about that? of the

9:56

industry because it feels like there's a

9:58

few different robotics companies now each taking

10:00

different approaches in terms of either hardware

10:03

only, I mean, excuse me, hardware plus

10:05

software and they're focused on a specific

10:07

hardware footprint, there's software and there's close

10:09

source versus open source if you're just

10:11

doing the software. So I'm sort of

10:13

curious where in that spectrum, physical intelligence

10:15

flies. Definitely. So we've actually been quite

10:18

open. Not only have we open source

10:20

some of the weights and release details

10:22

and technical papers, we've actually also been

10:24

working with hardware companies and giving designs

10:26

of robots to hardware companies. And some

10:28

people have actually, like, when I tell

10:31

people this, sometimes they're actually really shocked

10:33

that, like, what about the IP, what

10:35

about, I don't know, confidentiality and stuff

10:37

like that? And we've actually made this,

10:39

made a very intentional choice around this.

10:41

There's a couple of reasons for it.

10:44

One is that we think that the

10:46

field, it's really just the beginning, and

10:48

these models will be so, so much

10:50

better, and the robots should be so

10:52

much better in a year, in three

10:54

years, and we want to support the

10:57

development of the research, and we want

10:59

to support the community, support the robots,

11:01

so that when we hopefully develop the

11:03

technology of these generalist models. the world

11:05

will be more ready for it, will

11:07

have better, like, more robust robots that

11:10

are able to leverage those models, people

11:12

who have the expertise and understand what

11:14

it requires to use those models. And

11:16

then the other thing is also, like,

11:18

we have a really fantastic team of

11:20

researchers and engineers and really. really fantastic

11:23

researchers and engineers want to work at

11:25

companies that are that are open, especially

11:27

on researchers where they can get kind

11:29

of credit for their work and share

11:31

their ideas, talk about their ideas. And

11:33

we think that having the best researchers

11:36

engineers will be necessary for solving this

11:38

problem. The last thing that I'll mention

11:40

is that I think the biggest risk

11:42

with this bet is that it won't

11:44

work. Like I'm not really worried about

11:46

competitors. I'm more worried that No one

11:48

will solve the problem. Oh, interesting. And

11:51

why do you worry about that? I

11:53

think robotics is it's very hard. And

11:55

there's been many many failures in the

11:57

past and unlike when you're like recognizing

11:59

an object in an image there's very

12:01

little tolerance for error you can miss

12:04

a grasp on an object or like

12:06

not make like the difference between making

12:08

contact and not making contact in an

12:10

object is so small and it has

12:12

a massive impact on the outcome of

12:14

whether the robot can actually successfully manipulate

12:17

the object. And I mean, that's just

12:19

one example. There's challenges on the data

12:21

side of collecting data. Well, just anything

12:23

involving hardware is hard as well. I

12:25

guess we have a number of examples

12:27

now of robots in the physical world.

12:30

You know, everything from autopilot on a

12:32

jet on through to some forms of

12:34

pick-in-pack and or other types of robots.

12:36

and distribution centers, and there's obviously the

12:38

different robots involved with manufacturing, particularly in

12:40

automotive, right? So there's been a handful

12:43

of more constrained environments where people have

12:45

been using them in different ways. Where

12:47

do you think the impact of these

12:49

models will first show up? Because to

12:51

your point, there are certain things where

12:53

you have very low tolerance for error,

12:56

and then there's a lot of fields

12:58

where actually it's okay, or maybe you

13:00

can constrain the problem sufficiently relative to

13:02

the capabilities of the model that it

13:04

works fine. physical intelligence will have the

13:06

nearest term impact or in general the

13:08

field of robotics and these new approaches

13:11

will substantiate themselves. Yeah, as a company

13:13

we're really focused on on the long-term

13:15

problem and not like anyone particular application

13:17

because of the failure modes that can

13:19

come up when you focus on one

13:21

application, I don't know where the first

13:24

applications will be. I think one thing

13:26

that's actually challenging is that typically in

13:28

machine learning a lot of the successful

13:30

applications of like recommender systems, language models,

13:32

like image detection, a lot of the

13:34

consumers of that of the model outputs

13:37

are actually humans who could actually check

13:39

it and the humans are good at

13:41

the thing. A lot of the very

13:43

natural applications of robots is actually the

13:45

robot doing something autonomously on its own,

13:47

where it's not like a human consuming

13:50

the commanded arm position, for example. and

13:52

then checking it and then validating it

13:54

and so forth. And so I think

13:56

we need to think about new ways

13:58

of having some kind of tolerance for

14:00

mistakes or scenarios where that's fine or

14:03

scenarios where humans and robots can work

14:05

together. That's I think one big challenge

14:07

that will come up when trying to

14:09

actually deploy these and some of the

14:11

language interaction work that we've been doing

14:13

is actually. motivated by this challenge where

14:16

we think it's really important for humans

14:18

to be able to kind of provide

14:20

input for how they want the robot

14:22

to behave and what they want the

14:24

robot to do, how they want the

14:26

robot to help in a particular scenario.

14:29

That makes sense. I guess the other

14:31

form of generalizability to some extent at

14:33

least in our current world is the

14:35

human form, right? And so some people

14:37

are specifically focused on humanoid robots like

14:39

Tesla and others under the assumption that

14:41

the world is designed for people and

14:44

therefore is the perfect form factor to

14:46

coexist with people. And then other people

14:48

have taken very different approaches in terms

14:50

of single. I need something that's more

14:52

specialized for the home in certain ways

14:54

or for factories or manufacturing or you

14:57

name it. What is your view on

14:59

kind of humanoid versus not? On one

15:01

hand, I think that they're a little

15:03

overrated. And one way to practically look

15:05

at it is I think that we're

15:07

generally fairly ballnecked on data right now.

15:10

And some people argue that with humanoids.

15:12

you can maybe collect data more easily

15:14

because it matches the human form factor.

15:16

And so maybe it'd be easier to

15:18

mimic humans. And I've actually heard people

15:20

make those arguments, but if you've ever

15:23

actually tried to teleoperate a humanoid, it's

15:25

actually a lot harder to teleoperate than

15:27

that a static manipulator or mobile manipulator

15:29

with wheels. Optimizing for being able to

15:31

collect data, I think, is very important,

15:33

because if we can get to the

15:36

point where we have more data than

15:38

we could ever want, then it just

15:40

comes down to. research and compute and

15:42

evaluations. And so we're optimizing for, that's

15:44

one of the things we're kind of

15:46

optimizing for, and so we're using cheap

15:49

robots. We're using robots that we can.

15:51

very easily developed teleoperation interfaces for in

15:53

which you can do teleoperation very quickly

15:55

and collect diverse data, collect lots of

15:57

data. Yeah, it's funny. There was that

15:59

viral fake and Kardashian video for going

16:01

shopping with a robot following her around

16:04

carrying all of her shopping bags. When

16:06

I saw that, I really wanted a

16:08

humanoid robot to follow me around everywhere.

16:10

That'd be really funny to do that.

16:12

So I'm hopeful that someday I can

16:14

use your software to cause a robot

16:17

to follow me around to do things.

16:19

So exciting future. How do you think

16:21

about the embodied model of development versus

16:23

not on some of these things in

16:25

terms of that that that's another sort

16:27

of, sort of, trade-offs that some people

16:30

are making or deciding between. A lot

16:32

of the AI community is very focused

16:34

on just like language models, vision language

16:36

models and so forth and there's like

16:38

a ton of hype around like reasoning

16:40

and stuff like that. Oh, let's create

16:43

like the most intelligent thing. I feel

16:45

like actually people underestimate. how much intelligence

16:47

goes into motor control. Many, many years

16:49

of evolution is what led to us

16:51

being able to use our hands the

16:53

way that we do. And there are

16:56

many animals that they can't do it,

16:58

even though they had so many years

17:00

of evolution. And so I think that

17:02

there's actually so much complexity and intelligence

17:04

that goes into being able to do

17:06

something as basic as make a bowl

17:09

of cereal or poor glass of water.

17:11

And yeah, so in some ways I

17:13

think that actually like embodied intelligence or

17:15

physical intelligence or physical intelligence is very

17:17

core to intelligence and maybe kind of

17:19

underrated compared to some of the less

17:22

embodied models. One of the papers that

17:24

I really loved over the last couple

17:26

years in robotics was your Aloha paper

17:28

and I thought it was a very

17:30

clever approach. What is some of the

17:32

research over the last two or three

17:34

years that you think has really caused

17:37

this flurry of activity because I feel

17:39

like there's been a number of people

17:41

now starting companies in this area because

17:43

a lot of people feel like now

17:45

is the time to do it. And

17:47

I'm a little bit curious what research

17:50

you feel was the basis for that

17:52

shift and people thinking this was a

17:54

good place to work. At least for

17:56

us, there were a few things that

17:58

we felt like were turning points that

18:00

felt like, where it felt like the.

18:03

field was moving a lot faster compared

18:05

to where it was before. One was

18:07

the say can work where we found

18:09

that you can plan with language models

18:11

as kind of the high-level part and

18:13

then kind of plug that in with

18:16

a low-level model to get a model

18:18

to do long horizon tasks. One was

18:20

the Archie 2 work which showed that

18:22

you could do the Taylor Swift example

18:24

that I mentioned earlier and be able

18:26

to plug in kind of the a

18:29

lot of the... web data and get

18:31

better generalization on robots. A third was

18:33

our RTX work, where we were actually

18:35

were able to train models across robot

18:37

embodiments and significantly, we basically took all

18:39

the robot data that different research labs

18:42

had, it's a huge effort to aggregate

18:44

that into a common format and train

18:46

on it. And we also, when we

18:48

trained on that, we actually found that

18:50

we could take a checkpoint, send that

18:52

model checkpoint to another lab. halfway across

18:54

the country and the grad student at

18:57

that lab could run the checkpoint on

18:59

the robot and it would actually... more

19:01

often than not do better than the

19:03

model that they had specifically iterated on

19:05

themselves in their own lab. And that

19:07

was like another big sign that like

19:10

this stuff is actually starting to work

19:12

and that you can get benefit across

19:14

by pooling data across different robots. And

19:16

then also like you mentioned I think

19:18

the Aloha work and later the Mobile

19:20

Allejo work was work that showed that

19:23

you can tell you operate and get

19:25

models to train pretty complicated dexterous manipulation

19:27

tasks. We also had a follow-up paper

19:29

with the shoelase tying that was a...

19:31

a fun project because someone said that

19:33

they would retire if they saw a

19:36

robot tie shoelaces. So did they retire?

19:38

They did that retire. We need to

19:40

force them into retirement. Whoever that person

19:42

is, we need to follow up on

19:44

that. Yeah, so those were a few

19:46

examples. And so yeah, I think we've

19:49

seen a ton of progress in the

19:51

field. I also, it seems like after

19:53

we started pie that that was also

19:55

kind of assigned to others that if

19:57

the experts are really willing to bet

19:59

on this, then something, maybe something will

20:02

happen. So. One thing that you all

20:04

came out with today from Pi was

20:06

what you call a hierarchical interactive robot

20:08

or high robot. Can you tell us

20:10

a little bit more about that? So

20:12

this is a really fun project. There's

20:14

two things that we're trying to look

20:17

at here. One is that if you

20:19

need to do like a longer horizon

20:21

task, meaning a task that might take

20:23

minutes to do, then if you just

20:25

train a single policy to like output

20:27

actions based on images, Like if you're

20:30

trying to make a sandwich and you

20:32

train a policy that's just outputting the

20:34

next motor command, that might not do

20:36

as well as something that's actually kind

20:38

of thinking through the steps to accomplish

20:40

that task. That was kind of the

20:43

first component. That's where the hierarchy comes

20:45

in. And the second component is a

20:47

lot of the times when we train

20:49

robot policies. We're just saying, like, we'll

20:51

take our data, we'll annotate it and

20:53

say, like, this is picking up the

20:56

sponge, this is putting the bowl in

20:58

the bin, this segment is, I don't

21:00

know, folding the shirt, and then you

21:02

get a policy that can, like, follow

21:04

those basic commands of, like, fold the

21:06

shirt, or pick up the cup, those

21:09

sorts of things. But at the end

21:11

of the day, we don't want robots

21:13

just to be able to do that.

21:15

We want them to be able to

21:17

be able to interact. maybe don't include

21:19

those, and maybe also be able to

21:22

interject in the middle and say like,

21:24

oh, hold off on the tomatoes or

21:26

something. It's actually kind of a big

21:28

gap between something that can just follow

21:30

like an instruction like pick up the

21:32

cup and something that could be able

21:35

to handle those kinds of prompts and

21:37

those situated corrections and so forth. And

21:39

so we developed a system that basically

21:41

has one model that takes us and

21:43

put the prompt and kind of reasons

21:45

through as able to output like the

21:47

next step that the robot should follow

21:50

and that might be that's kind of

21:52

like it's going to tell it to

21:54

then the next thing will be pick

21:56

up the tomato for example and then

21:58

a lower level model that takes its

22:00

input pick up the tomato and outputs

22:03

the sequence of motor commands for the

22:05

next like half second that's the gist

22:07

of it we it was a lot

22:09

of fun because we actually got the

22:11

robot to make a vegetarian sandwich or

22:13

a ham and cheese sandwich or whatever.

22:16

We also did a grocery shopping example

22:18

and a table cleaning example and I

22:20

was excited about it first because it

22:22

was just like cool to see the

22:24

robot be able to respond to different

22:26

problems and do these challenging tasks and

22:29

second because it actually seems like a

22:31

like the right approach for solving the

22:33

problem. On the technical capabilities side one

22:35

thing I was wondering about a little

22:37

bit was If I look at the

22:39

world of self-driving, there's a few different

22:42

approaches that are being taken, and one

22:44

of the approaches that is the more

22:46

kind of waymo-centric one is really incorporating

22:48

a variety of other types of sensors

22:50

besides just visions. We have LIDAR and

22:52

a few other things, and a few

22:55

other things, and a few other things,

22:57

as ways to augment the self-driving capabilities

22:59

of a vehicle. Where do you think

23:01

we are in terms of the sensors

23:03

that we use in the context of

23:05

robots? So we've gotten very far just

23:07

with vision with RGB images even and

23:10

we typically will have one or multiple

23:12

external kind of what we call base

23:14

cameras that are looking at the scene

23:16

and also cameras mounted to each of

23:18

the risks of the robot. We can

23:20

get very very far with that. I

23:23

would love if like skin if we

23:25

could give our robot skin. Unfortunately, a

23:27

lot of the tactile sensors that are

23:29

out there are either far less robust

23:31

than skin, far more expensive, or very,

23:33

very low resolution. So there's a lot

23:36

of kind of challenges on the hardware

23:38

side there. And we found that actually

23:40

that mounting RGB cameras to the wrists

23:42

ends up being very, very helpful, and

23:44

probably giving you a lot of the

23:46

same information that tactile sensors. can give

23:49

you? Because when I think about the

23:51

set of sensors that are incorporated into

23:53

a person, obviously to your point, there's

23:55

the tactile sensors, effectively, right? And then

23:57

there's heat sensors, there's actually a variety

23:59

of things that are incorporated that people

24:02

usually don't really think about much. Absolutely.

24:04

And I'm just sort of curious, like

24:06

how many of those are actually necessary

24:08

in the context of robotics versus not,

24:10

what are some of the things we

24:12

should think about, like, just if we

24:15

extrapolate off of humans or animals or

24:17

other, you know. It's a great question.

24:19

I mean, for the sandwich making, you

24:21

could argue that you'd want the robot

24:23

to be able to taste the sandwich

24:25

to know if it's good or not.

24:28

For smell it at least, you know.

24:30

Yeah, I've made a lot of arguments

24:32

for smell to Sergei in the past,

24:34

because there's a lot of nice things

24:36

about smell, although we've never actually attempted

24:38

it attempted it before. For example, and

24:40

I think like audio, for example, like

24:43

a human, if you hear something that's

24:45

unexpected, it can actually kind of alert

24:47

you to something. In many cases, it

24:49

might actually be very, very redundant with

24:51

your other sensors, because you might be

24:53

able to actually see something fall, for

24:56

example, and that redundancy can lead to

24:58

robustness. For us, it's not currently not

25:00

a priority to look into these sensors,

25:02

because we think that the bottleneck right

25:04

now is... elsewhere is on the data

25:06

front is on kind of the architectures

25:09

and so forth. The other thing I'll

25:11

mention is actually right now where most

25:13

like our policies right now do not

25:15

have any memory. They only look at

25:17

the current image frame. They can't remember

25:19

even half a second prior. And so

25:22

I would much rather add memory to

25:24

our models before we add other sensors.

25:26

We can have commercially viable robots for

25:28

a number of applications without other centers.

25:30

What do you think is a time

25:32

frame on that? I have no idea.

25:35

Some parts of robotics that make it

25:37

easier than self-driving and some parts that

25:39

make it harder. On one hand, it's

25:41

harder because you're not just, like, it's

25:43

just a much higher dimensional space. even

25:45

our static robots have 14 dimensions of

25:48

seven for each arm. You need to

25:50

be more precise in many scenarios than

25:52

driving. We also don't have as much

25:54

data right off the bat. On the

25:56

other hand, with driving, I feel like

25:58

you kind of need to solve the

26:00

entire distribution to have anything that's viable.

26:03

You have to be able to handle

26:05

an intersection at any time of day

26:07

or with any kind of possible pedestrian

26:09

scenario or other cars and all that.

26:11

Whereas in robotics, I think that there's

26:13

lots of commercial use cases where you

26:16

don't have to handle this whole huge

26:18

distribution, and you also don't have as

26:20

much of a safety risk as well.

26:22

That makes me optimistic, and I think

26:24

that also, like, all the results in

26:26

self-driving have been very encouraging, especially, like,

26:29

the number of waymows that I see

26:31

in San Francisco. Yeah, it's been very

26:33

impressive to watch them scale up by

26:35

usage. I think I found striking about

26:37

this help driving world is... There was

26:39

two dozen startups, started roughly, I don't

26:42

know, 10 to 15 years ago around

26:44

self-driving. And the industry is largely consolidated,

26:46

at least in the US, and obviously

26:48

the China market's a bit different, but

26:50

it's consolidated into Waymo and Tesla, which

26:52

effectively were two incumbents, right? Google and

26:55

Tesla was an automaker. And then there's

26:57

maybe one or two startups that either

26:59

spacked and went public or two startups

27:01

that either spacked and went public or

27:03

still kind of working in the area.

27:05

And then most of it's kind of

27:08

fallen off, right? And the set of

27:10

players that existed at that starting moment,

27:12

at that starting moment, just consolidation. Do

27:14

you think that the main robotics players

27:16

are the companies that exist today? And

27:18

do you think there's any sort of

27:21

incumbency bias that's likely? A year ago,

27:23

like, it would be completely different. And

27:25

I think that we've had so many

27:27

new players recently. I think that the

27:29

fact that self-driving was like that suggested

27:31

that it might have been a bit

27:33

too early 10 years ago for it.

27:36

And I think that arguably it was,

27:38

like, I think deep learning has come

27:40

a long, long way since then. And

27:42

so I think that that's also part

27:44

of it. And I think that the

27:46

same with robotics, like if you were

27:49

to ask you 10 years ago, or

27:51

even five years ago, honestly, I think

27:53

it would be too early. I think

27:55

the technology wasn't there yet. We might

27:57

still be too early. For all we

27:59

know, I mean, it's a very hard

28:02

problem. And like, how hard self-driving has

28:04

been, and I think it's a testament

28:06

to how hard is to build intelligence

28:08

in the physical world. In terms of

28:10

like major players, liked about the startup

28:12

environment and a lot of things that

28:15

were very hard to do when I

28:17

was at Google. Google is an amazing

28:19

place in many, many ways, but like,

28:21

as one example, taking a robot off

28:23

campus was like almost a non-starter, just

28:25

for code security reasons. And if you

28:28

want to collect diverse data, taking robots

28:30

off campus is valuable. You can move

28:32

a lot faster when you're a smaller

28:34

company when you don't have... kind of

28:36

restrictions, red tape, that sort of things.

28:38

The really big companies, they have a

28:41

ton of capital so they can last

28:43

longer, but I also think that there's,

28:45

they're going to move slower too. If

28:47

you were to give advice to somebody

28:49

thinking about starting a robotics company today,

28:51

what would you suggest they do or

28:53

where would you point them in terms

28:56

of what to focus on? I think

28:58

that actually like... trying to deploy quickly

29:00

and learn and iterate quickly. That's probably

29:02

the main advice and try to, yeah,

29:04

like. actually get the robots out there,

29:06

learn from that. I'm also not sure

29:09

if I'm the best person to be

29:11

giving startup advice because I've only been

29:13

an entrepreneur myself for 11 months, but

29:15

yeah, that's probably the advice. Thank you.

29:17

Yeah, that's cool. I mean, you're running

29:19

an incredibly exciting startup. So I think

29:22

you have a full ability to suggest

29:24

stuff to people in that area for

29:26

sure. I've heard a number of different

29:28

groups doing is really using observational data

29:30

of people as part of the training

29:32

set. purpose. How do you think about

29:35

that in the context of training robotic

29:37

models? I think that they can have

29:39

a lot of value, but I think

29:41

that by itself it won't get you

29:43

very far. And I think that there's

29:45

actually some really nice analogies you can

29:48

make where, for example, if you watch

29:50

like an Olympic swimmer, swimmer race, even

29:52

if you had their strength, just their

29:54

practice at moving their own muscles to

29:56

do the to accomplish with their accompl.

29:58

is like essential for being able to

30:01

do it or if you're trying to

30:03

learn how to hit a tennis ball

30:05

well you won't be able to learn

30:07

it by kind of watching the pros.

30:09

No. Maybe these examples seem a little

30:11

bit contrived because they're talking about like

30:14

experts. The reason why I make those

30:16

analogies is that we humans are experts

30:18

at motor control, low level motor control

30:20

already for a variety of things and

30:22

our robots are not. And I think

30:24

the robots actually need experience from their

30:26

own body in order to learn. And

30:29

so I think that it's really promising

30:31

to be able to leverage that form

30:33

of data, especially to expand on the

30:35

robots own experience, but it's really going

30:37

to be essential to like actually have

30:39

the data from the robot itself. is,

30:42

is that just general data that you're

30:44

generating around that verbat, or would you

30:46

actually have it mimic certain activities, or

30:48

how do you think about the data

30:50

generation? Because you mentioned a little bit

30:52

about the transfer and generalizability. It's interesting

30:55

to ask, well, what is generalizable or

30:57

not, and what types of data are,

30:59

and things like that. I mean, when

31:01

we collect data, we have, it's kind

31:03

of like puppeteering, like the original Aloha

31:05

work, and then you can record both.

31:08

like the camera images and so that

31:10

is the like experience for the robot

31:12

and then I also think that autonomous

31:14

experience will play a huge role just

31:16

like we've seen in language models after

31:18

you get an initial language model if

31:21

you can use reinforcement learning to have

31:23

the robot the the language model bootstrap

31:25

on its own experience. That's extremely valuable.

31:27

Yeah, and then in terms of what's

31:29

generalizable versus not, I think it all

31:31

comes down to the breadth of the

31:34

distribution. It's really hard to quantify or

31:36

measure how broad the robot zone experiences.

31:38

And there's no way to categorize. the

31:40

breadth of the tasks, like how different

31:42

one task is from another, how different

31:44

one kitchen is from another, that sort

31:46

of thing. But we can at least

31:49

get a rough idea for that breadth

31:51

by like looking at things like the

31:53

number of buildings or the number of

31:55

scenes, those sorts of things. And then

31:57

I guess we talked a lot about

31:59

humanoid robots. and other sort of formats,

32:02

if you think ahead in terms of

32:04

the form factors that are likely to

32:06

exist in end years as this sort

32:08

of robotic future comes into play, do

32:10

you think there's sort of one singular

32:12

form or there are a handful? Is

32:15

it a rich ecosystem? Just like in

32:17

biology, like how do you think about

32:19

what's gonna come out of all this?

32:21

I don't know exactly, but I think

32:23

that my bet would be on something

32:25

where there's actually a... really wide range

32:28

of different robot platforms. I think Sergei,

32:30

my co-founder, likes to call it a

32:32

Cambrian explosion of different robot hardware types

32:34

and so forth. Once we actually can

32:36

have the technology that can, it's intelligence

32:38

that can power all those different robots.

32:41

And I think it's kind of similar

32:43

to like, we have all these different

32:45

devices in our kitchen, for example, that

32:47

can do all these different things for

32:49

us. And rather than just like one

32:51

device that can that. cooks the whole

32:54

meal for us. And so I think

32:56

we can envision like a world where

32:58

there's like one kind of robot arm

33:00

that does things on on the kitchen

33:02

that has like some hardware that's optimized

33:04

for that and maybe also optimized for

33:06

it to be cheap for that particular

33:09

use case and another hardware that's kind

33:11

of designed for for like folding clothes

33:13

or something like that, dishwashing, those sorts

33:15

of things. These all like speculation of

33:17

course, but I think that a world

33:19

like that is something where, yeah, it's

33:22

I think different from what a lot

33:24

of people think about. In the book

33:26

The Diamond Age, there's sort of this

33:28

view of like matter pipes going into

33:30

homes and you have these 3D printers

33:32

that make everything for you. And in

33:35

one case you're like downloading schematics and

33:37

then you 3D print the thing and

33:39

then people who are kind of bootlegging

33:41

some of the thing and then people

33:43

who are kind of bootlegging some of

33:45

the stuff end up with almost evolutionary

33:48

base processes to build hardware and then

33:50

select against certain functionality is the mechanism

33:52

by which to optimize. you don't need

33:54

that much specialization if you have enough

33:56

generalizability in the actual underlying intelligence. I

33:58

think the world like that is very

34:01

possible. And I think that you can

34:03

make a cheaper hardware piece of hardware

34:05

if you are optimizing for a particular

34:07

use case and maybe it'd be like

34:09

also be a lot faster and so

34:11

forth. Yeah, obviously very hard to predict.

34:14

Yeah, it's super hard to predict because

34:16

one of the arguments for a smaller

34:18

number of hardware platforms is just supply

34:20

chain, right? It's just going to be

34:22

cheaper at scale to manufacture all the

34:24

sub components and therefore you're going to

34:27

collapse down to fewer things because easily

34:29

scalable, reproducible, cheap to make, etc. right?

34:31

If you look at sort of general

34:33

hardware approaches. So it's an interesting question

34:35

in terms of that tradeoff between those

34:37

two tensions. Yeah, although maybe we'll have

34:39

robots in the supply chain that can

34:41

manufacture any customizable device that you want.

34:43

It's robots all the way down. So

34:45

that's our future. Yeah. Well, thanks so

34:47

much for joining me today. It's a

34:49

super interesting conversation. We covered a wide

34:51

variety of things. I really appreciate your

34:53

time. Find us on Twitter

34:56

at no priors pod. Subscribe to our

34:58

YouTube channel if you want to see

35:00

our faces. Follow the show on Apple

35:02

podcast, Spotify, or wherever you listen. That

35:04

way you get a new episode every

35:06

week. And sign up for emails or

35:08

find transcripts for every episode at no

35:10

dash priors.com.

Unlock more with Podchaser Pro

  • Audience Insights
  • Contact Information
  • Demographics
  • Charts
  • Sponsor History
  • and More!
Pro Features