Inside Cursor: The future of AI coding with Co-founder Sualeh Asif by Gradient Dissent: Conversations on AI | Podchaser

Episode from the podcastGradient Dissent: Conversations on AI

Inside Cursor: The future of AI coding with Co-founder Sualeh Asif

Released Tuesday, 29th April 2025

Good episode? Give it some love!

Inside Cursor: The future of AI coding with Co-founder Sualeh Asif

Inside Cursor: The future of AI coding with Co-founder Sualeh Asif

Tuesday, 29th April 2025

Good episode? Give it some love!

Rate Episode

Podchaser Pro

Episode Transcript

Transcripts are displayed as originally observed. Some content, including advertisements may have changed.

Use Ctrl + F to search

0:01

You're listening to Grady of Descent, a

0:03

show about making machine learning work

0:05

in the real world, and I'm

0:07

your host, Lucas B. Wald. Swale

0:11

Asif is the CPO and

0:13

co -founder of Cursor, one

0:15

of the best loved and

0:17

most exciting and popular AI

0:19

products out there. It helps

0:21

you with coding, helps you

0:23

use LMS to do coding. I

0:26

use it all the time, and I really love

0:28

it. And I was just excited to ask him

0:30

about how he built such a great product. I

0:32

found his answers super interesting, and I hope you

0:34

enjoy this interview. All

0:38

right. Well, thanks so much

0:40

for taking the time to

0:42

talk. I guess maybe this is

0:44

a softball question, but I

0:46

was really interested in just hearing

0:48

the story of Curster, like

0:50

how you started it. what

0:53

the moment was where it really started to

0:55

take off because, you know, now it's like one

0:57

of the most loved products I think out

0:59

there. I

1:01

mean, history comes from,

1:04

we had been

1:06

really interested in sort

1:08

of scaling laws

1:11

and back in college,

1:13

sort of, I had gone on and worked

1:16

on a sort of search engine type company

1:18

with a friend, and there

1:20

we were really bullish on language models, because

1:22

it felt like language models could

1:25

really compress all the world's information, and

1:27

there should be this end -to -end

1:29

index of searching the internet. Instead

1:32

of many of the heuristics we have coded

1:34

in over the years, it felt like

1:36

you could sort of, that should be

1:38

the end -to -end way of doing things. scaling

1:44

laws, doing the search

1:47

engines, training large models at the

1:49

time. I think CoalPallet was the first

1:51

really big moment for us, where

1:53

it was this project that was truly

1:55

magical. It was fast. It

1:58

felt like kind of mew you. But

2:01

then CoalPallet did not improve much over

2:03

the coming year or two. And

2:06

for us, when we saw a Gping floor,

2:08

we thought the ceiling floor, what

2:10

a really, really great product. was

2:14

possible at that

2:16

moment was really

2:18

high and Then it was like

2:20

pretty clear that like as the models

2:22

got much better like you know skill

2:24

and loss progress Models get much better

2:26

the product that can be built in

2:28

the future is even even higher

2:30

ceiling and that was

2:32

like You just it was

2:34

this sort of super attractive thing to go

2:36

to it. And you know, we're all the

2:38

coders at our 10 we

2:42

wanted to be building things that we use every day. And

2:45

you know, cursor was originally built

2:47

for ourselves in many ways. It

2:49

was, and it

2:52

was sort of fun seeing that, you know,

2:54

everyone else really liked it. It was definitely built

2:56

for ourselves. And

2:59

we were sort of experimenting. So a lot of

3:01

the early culture of the company was experimenting

3:03

with various different ways of using the models. Did

3:05

there be a document that you're sort of

3:07

typing things out and the model is coding things?

3:09

Should there be? If

3:12

you want to do this next

3:14

action prediction of

3:16

you're in a location, what should be

3:18

the edit? Maybe the model should be telling

3:20

you where to go next. You should be

3:23

able to make edits over your entire

3:25

repository. Some

3:27

of those things have

3:29

taken a year, a year and a

3:31

half, several iterations,

3:33

and some of them

3:36

we've continued building on. Now,

3:38

some of the core parts of the product

3:40

is this next action prediction thing, where

3:43

it predicts your next

3:45

edit at the correct location and then where you

3:47

should be going next, and

3:49

that people really, really love that feature. Then

3:52

we're working our way towards,

3:55

you should just be able to make any edit you

3:57

want across the entire repository, like

3:59

code -based wide. Obviously,

4:02

there are some products along the way that we'll talk

4:05

about. some

4:07

easy, some, you know, sort

4:09

of still quite difficult. Like model

4:11

solos struggle with what exactly

4:13

the architecture of the repository is.

4:15

If you, you know, ask,

4:17

what is the architecture of the

4:19

repository that is really quite

4:21

difficult because it requires sort of

4:24

looking at potentially billions of

4:26

tokens, tens of billions of tokens

4:28

and say, asking the question,

4:30

what is really going on as

4:32

opposed to like, you could

4:34

like list the function aids, right? Like. But

4:38

that doesn't really tell you, you know, that no

4:40

one is exactly going on. Well, totally,

4:42

I want to dive into that as much as

4:44

you're comfortable sharing. But I guess I wanted to

4:46

ask you, you know,

4:48

one of the surprising things that I learned

4:50

in my, you know, background research on you

4:52

is I think you guys came from using

4:55

VIM, not VS Code. Is that, is that

4:57

right? All of us were

4:59

really early users of VIM. We did eventually,

5:01

you know, had used VS Code. Probably

5:03

the last one too, there was a couple

5:05

of us. Aman and Arvid probably were

5:07

the last to switch over from WIM to

5:09

VS Code and the trigger there was

5:11

get a copilot. Oh, I

5:13

see. So get a copilot actually

5:15

pulled you over in the end.

5:17

So I had switched over before,

5:20

but then Aman and Arvid only

5:22

switched over after get a copilot

5:24

became. It was just the kinder

5:26

feature, right? In some ways, it was the

5:28

kinder feature. Totally. And why

5:30

doesn't something like Vim actually have

5:32

something like what you guys built?

5:34

It seems like a lot of

5:36

smart coders like to use it.

5:39

Is there something about it like a

5:41

graphical interface that lends itself to

5:43

this kind of structure, like coding with

5:45

an AI? I

5:47

think for us, VS Code

5:50

is, for one, it's pretty

5:52

clear the most loved platform on

5:54

the internet. or

5:56

for coders, it's the thing that

5:58

sort of is the factor. It's the

6:00

default. And

6:03

we wanted to sort of

6:05

incrementally evolve it towards the

6:07

world where you're starting to

6:09

automate coding. And

6:11

the cursor of one year from now should

6:14

look very different from the cursor of today,

6:16

which means almost by default, it should

6:18

not look exactly like the S

6:20

code. In

6:23

looking very different, you wanted to

6:25

start to replace where you didn't want

6:27

to have a text box to

6:29

code because coders still want to type

6:31

characters. You

6:33

want to be able to edit your

6:35

entire repository at a higher level, but

6:38

at some point, if you

6:40

find that there's a change that you can

6:42

quickly execute in 10 keystrokes, we want to let

6:44

you be able to dive into the details. At

6:47

any point in time, you're immediately

6:49

editing some pseudocode representation. maybe

6:53

a year from now, right? Like

6:55

humans are editing a pseudocode representation. And

6:59

that's really quick to add in and the

7:01

model is sort of working for you in

7:03

the background, but you're writing some kernel and

7:05

you want to go in and talk about

7:07

some of the indices. It's much easier to

7:09

do it by hand. Then you always, I

7:11

think developers will want this ability to go

7:13

in and, you know, unless

7:16

we truly believe that everything is going away,

7:18

it's like you really, really want the fine

7:20

-grained control. Yeah, yeah, that

7:22

makes sense You know one thing that

7:24

that strikes me from what you were saying

7:26

earlier about Like observing that co -pilot was

7:28

really great and there's all these you

7:30

know all this opportunity and how do you

7:32

kind of work with these AI models

7:35

is I think a lot of people other

7:37

people thought that at the same time

7:39

so you know like you know you had

7:41

this idea that I think of many

7:43

people had including a bunch of like YC

7:45

companies and other products that I saw

7:47

and it seemed like cursor emerged

7:50

as kind of the winning one among

7:52

these, right? So it seems like there was

7:54

great product execution here, which I'm always

7:56

really interested in. Like, do you have a

7:58

sense for what you were doing differently

8:00

than your competitors that made your product work

8:03

so well? Was it like, was it

8:05

like certain decisions or was it like a

8:07

process? You know, why questions

8:09

throw in really hard? I don't know. It's

8:13

very hard to tell exactly what we

8:15

did, right? I think there was a

8:17

bunch of things where We

8:19

always tried to push the ball

8:21

as much as possible without being...

8:23

We always wanted to be the

8:25

most useful product at any moment

8:27

in time. It's like at the

8:30

frontier. It's very

8:32

easy to overform as an underdeliver.

8:35

And a lot of what we have

8:37

tried to do is to... We

8:39

didn't ship the agent until we were

8:41

very confident it was something that

8:43

was really useful. And we

8:45

had probably done three agent

8:47

prototypes before that. we did

8:49

a shift because some version

8:51

of the model would just

8:54

lose track and you could

8:56

make something that was, could

8:59

help you in the short term

9:01

and really hurt what people think

9:03

of as a reliable product in

9:06

the long term. Maybe

9:08

that is part of it. It was always, but

9:10

then also I think we've been first to a

9:12

lot of the inventions that people really like. So

9:15

more recently,

9:19

the ability to jump to the

9:21

next location that should be

9:23

edited is something we've had for

9:25

closer to eight months, 10

9:28

months a year, and

9:30

we hopefully will release a much

9:32

more upgraded version of it soon that

9:34

will be quite a bit better. And

9:38

only recently, you know, other people have

9:40

tried to do that. So we've always

9:42

tried to be like, think

9:44

of what's coming and at least have a prototype. as

9:47

soon as we think it's something that should

9:49

be useful. There's the tap to jump feature saves.

9:53

There's the apply features yet. And

9:56

we also like, you know, we've done this at

9:58

scale also. So I think that that has benefited

10:01

where, for example, for

10:03

our custom tab model, we

10:05

do something like 100 million requests

10:07

a day and quickly growing

10:09

it. I think part of doing

10:11

it well has been able to do it

10:13

reliably for lots and lots and lots of

10:15

people. Do you

10:18

think that any of the data that

10:20

you have or the feedback that

10:22

you have from users as part of

10:24

your success, or are you more

10:26

making decisions through your own experience? The

10:29

data has definitely been

10:31

enormously useful. I

10:34

think the feedback loops

10:36

that people consider obvious are

10:38

indeed extremely useful. You

10:40

want to be the company

10:42

that feeds

10:44

an extremely good product

10:46

that everyone loves, and

10:48

then that definitely helps in making the next version

10:50

even better. It helps in

10:52

small ways, it helps in training

10:55

models, it helps in, yeah. Where

10:59

the small ways are, you understand

11:01

how people are using your product, what

11:03

is the most important thing to

11:05

ship at any moment, and then in

11:07

big ways in training models and

11:09

improving just the core workflows. For

11:12

example, technically speaking, one loop

11:14

in the apply use case

11:16

is, you're

11:18

creating your first version of an apply that

11:20

is quite a bit bigger, and

11:22

you then deploy it for all users, you

11:25

get lots and lots of data, and then

11:27

you can distill a slightly smaller model. That

11:29

gets faster, the people use it even

11:31

more, and you then distill an even

11:33

smaller model, and you can keep compressing the

11:35

models down because you're generating the data that allows

11:37

you to do that. But also,

11:42

Yeah, it's just it's this like feedback loop

11:44

and then the yeah, some of the

11:46

things get faster So again for now up

11:48

to like I don't know a thousand

11:50

or two thousand line file apply feels effectively

11:53

instant and that's what we wanted to

11:55

feel right right we wanted to feel like

11:57

applies this Deterministic, you know, they figured

11:59

out some deterministic algorithm to like place the

12:01

box, but that's not actually what's happening

12:03

It's a model that is actually group aiding

12:05

the entire file and a lot of

12:07

you know A lot of the

12:09

improvements have been making the model smaller.

12:12

There's obviously improvements and just making the

12:14

inference much faster when doing these speculative

12:16

edits. For

12:18

something like the agents that

12:20

you talked about, you had some

12:23

iterations where it wasn't useful

12:25

enough to ship. How did

12:27

you know that it wasn't good enough to ship?

12:29

How did you think about that? I didn't use

12:31

it on a daily basis. I think these things

12:33

are really quite easy to figure out. If,

12:36

you know, you're coding 10 hours a

12:38

day in cursor, right? Like you're, you boot

12:40

up the editor, you're making the improvements,

12:42

and you're seeing it on daily basis. If

12:45

the devs themselves can't don't use it

12:47

every single day, it's probably not something

12:49

that everyone else will want to use.

12:51

I mean, there's obviously corner cases to

12:53

this thing, where we're not the perfect

12:55

coders, but, you know, a thing like

12:57

an engine is such a general feature

12:59

that if you're not using it, it's

13:01

almost certainly not useful. That actually leads

13:03

me to another question I had, which

13:06

is you and your co -founders have

13:08

this background in sort of competitive coding,

13:10

right? Like, you know, does that, do

13:12

you think that's an advantage for you? Because

13:14

I could imagine that that might sort of

13:16

put you at sort of the forefront of

13:19

like wanting to be efficient in coding. But

13:21

I could also imagine that you might have

13:23

any asyncrasies in the way that you want

13:25

to write code that might be different than

13:27

your general user. I think we're not only

13:29

competitive coders. Like, we did competitive math and

13:31

coding because, you know, that's sort of part

13:33

of the background. It's always

13:35

really hard to distinguish, you know,

13:37

what part of your identity is

13:39

the most important, but many of

13:41

us had worked at sort of

13:44

software companies before, Stripe and the

13:46

like. And you

13:48

had some idea that production coding was

13:50

very different. And

13:52

then people had actually be built products.

13:54

So I think Michael had spent

13:56

quite a bit of time building these

13:58

high performance schemes. And,

14:01

you know, we have done modeling work. So

14:03

we had seen quite a wide variety of coding.

14:07

So bringing back to, like, just

14:09

competitive programming really, really affect

14:11

how you did coding on a

14:13

day -to -day basis. Like, not

14:15

really. It was

14:17

just, I think we knew what engineering

14:19

was. We were sort of doing day -to

14:21

-day engineering. And you could see if

14:24

the agent was helpful. And in this

14:26

case, it was very clear that, for

14:28

example, early iterations were not really that

14:30

useful. It was very slow. One

14:33

of the most important things that had

14:36

changed there is the length of the

14:38

context windows that you can do on

14:40

every single keystroke on every single request.

14:43

When the model started out, you would

14:45

start doing these 4k, 8k context windows.

14:48

Even if the model slightly supported them,

14:50

the models were not very good at

14:52

using the large context windows. Now

14:54

that you can easily do S that cross

14:56

-curve has gone down for the language models,

14:58

you can do requests in

15:00

the order of like, you

15:02

know, made hundreds of that,

15:04

like, you know, 50 ,000 tokens,

15:07

60 ,000 tokens reliably, that

15:09

has like enormously helped. One

15:12

way, one intuition to have here is

15:14

the model can't even feed your current file,

15:16

like it would not be very useful,

15:18

let alone like read the rest of your

15:20

repository or do searches or lots of

15:22

things that you expect a basic agent to

15:24

be able to do. That

15:26

wouldn't work if you're 8k tokens. 8k tokens,

15:28

what can you even fit in there? Another

15:32

interesting thing that you guys said in

15:34

your interview with Lux Freedom is that

15:36

you kind of wanted the experience of

15:38

using a code editor to be fun,

15:40

which I thought was kind of a

15:42

cool idea, like a little bit surprising,

15:45

right? It seems like such a utilitarian

15:47

thing. It kind of reminded me that

15:49

when I I remember

15:51

when I switched InCursor from

15:53

my default LM from Sonnet

15:55

to 01. Actually,

15:57

I think I started coding a little less.

15:59

I was actually having a lot less fun

16:01

because the latency was higher. It took me

16:03

a little while to realize that, but I

16:05

actually, for some reason, Sonnet had lower latency,

16:07

and it made it just a lot more

16:09

fun. I was like, you know what? I

16:11

just need to go back to the LM

16:13

where I was just enjoying writing code. I

16:15

do actually relate to what you're saying, but

16:18

I'm curious how that idea of a

16:20

fun experience shows up

16:22

in your utilitarian feeling

16:24

application. I

16:26

think it just, I mean, there's

16:28

always this sort of end metric, right?

16:30

The end metric is how much we

16:33

are enjoying using the model. And it's

16:35

been very clear that we enjoy using

16:37

Sonic more than the one. And

16:41

part of it is there's a

16:43

few things. So one is, I

16:46

think, Sonic is extremely is

16:48

even at scale, like, reliably

16:50

quite fast. And

16:52

I think we want to ship

16:54

models that are even faster,

16:56

that are better than Sonnet, that

16:58

are much longer context windows, that

17:01

could be, you know, edits reliably over

17:03

a much larger set of peer codebase,

17:05

for exactly the same reason, because it

17:07

becomes much more fun. So

17:09

it's, in some sense, it's this

17:12

hard to pin down feeling, but

17:14

in some sense, you know what

17:16

really affects it. Like, you will

17:18

get bothered if you have to

17:20

explain to the model again and

17:22

again what you're doing, or you

17:24

will get bothered if the model

17:26

doesn't really understand that you had

17:29

some easily viewed file open and

17:31

the model doesn't see it. And

17:33

that shows the Panoi. It's just

17:35

straight up a Panoi. So

17:38

you can turn it into some of

17:40

the tautical thing that you can track down.

17:43

But some of the inventions are just like, you

17:46

know, wouldn't it be more fun if blah,

17:48

blah, blah happened? Like, wouldn't it be more

17:50

fun if you were coding and the model

17:52

would just, once you started doing it the

17:54

factory, you could tap, tap, tap the entire

17:56

thing, like 10 tabs, what would that take?

17:59

And then once you think like, oh, you

18:01

know, 10 tabs would make me feel

18:03

really, really happy, you can then sort of

18:05

reverse and change the exact thing. Like,

18:08

the modeling works that you would have to

18:10

do, so like, what size of the model

18:12

you want to train, how

18:14

much time you want to spend sort of pre

18:18

-training, post -training, and RL -ing

18:20

the models to be able

18:22

to consistently do the same

18:24

behavior again and again. Another

18:29

concrete example is you

18:31

could always over -train the

18:33

tab models to be annoying.

18:37

So part of this, if you were

18:39

to only worry about making sure that

18:41

every single time it does the edit, you

18:44

would Overprit

18:47

like sometimes you really want to

18:49

like be writing against in kernel You

18:51

want to spend some time thinking

18:54

and you don't want the tab model

18:56

bothering you and that's the thing

18:58

you would only care about if you're

19:00

making it fun and enjoyable as

19:02

opposed to something that like Yeah, so

19:04

something that's like, you know, obviously

19:06

just always overpritting but this is a

19:09

pretty subjective experience

19:11

that you probably couldn't pull from

19:13

user data. How do you work through

19:15

that internally? Do you ever have

19:17

a difference of opinion with yourselves around

19:19

what's the more fun approach? Yes. I

19:23

think some of these transitions are subjective,

19:25

but I think if you think it

19:27

out, they're not always that controversial. Interesting.

19:30

At the end of the day, you're trying

19:33

it out. There's

19:35

always some intuition where you might over figure

19:37

in some direction, but for the most part. I

19:40

think there's not that much

19:42

argument, or is Sonnet more fun,

19:44

or is O1 more fun?

19:46

I mean, Sonnet is arguably varying.

19:49

Hopefully, there'll be more models that are

19:51

optimized towards keeping you in the

19:54

flow. I think you need

19:56

two categories of models. You

19:58

need the category of models that

20:00

is RL, towards being fast, and

20:03

super large context windows, and just

20:05

make edits across the entire probation.

20:08

Make you feel like you're breezing

20:10

through things. And you want a

20:12

category of models that is trained

20:14

for being extremely careful for reviewing

20:16

every single small thing before they

20:18

make the edit. Maybe

20:21

you do a bunch of research. They

20:23

didn't make the edit in the background for

20:25

you and then come back to you

20:27

with a PR. And in that case, the

20:29

thing that will be fun is if

20:31

they're more correct than not. And

20:34

then fast is not the only thing

20:36

that's fun. It's being correct or how

20:38

they write out. How did it prove

20:40

to you that they're doing the right thing? I

20:43

guess as you kind of

20:45

build a bigger brand and

20:47

you build trust with users

20:49

like me, why are

20:51

you even asking me what model I want

20:53

to use? I'm sort of aware of the

20:55

different models, but I would sort of trust

20:58

you more to know what's going to be

21:00

fun and useful for me. Yeah,

21:02

I think you're kind of right. Part

21:09

of building the trust was always showing

21:11

exactly what we're using and I think it's

21:13

you're probably correct that we should have

21:15

a default mode and You should use the

21:17

defaulting feel happy, but there's always if

21:19

you're if you're the kind of person that

21:21

wants to do mode and Want to

21:23

perfectly fine -tune every single thing you should

21:25

be able to do that and then there

21:27

should be the simple defaulting So there

21:29

should be a release in a week or

21:32

two that fixed all of this for

21:34

you. Oh

21:36

Here's something I've been wondering about myself

21:39

quite a bit. Do you

21:41

think that, do your best

21:43

practices in changing the structure

21:45

of my own code base

21:47

or the way that I

21:49

should code to make your

21:51

product work even better? For

21:53

example, we have one engineer that's

21:55

been sort of letting the LM

21:57

put in notes inside the code

21:59

base of helpful things to kind

22:01

of help understand the code base.

22:03

one of the things that we

22:05

did. We've been sort

22:08

of speculating about we don't

22:10

actually have a really correct solution

22:12

there, but this idea of

22:14

like maybe there should be a

22:16

readme .ai .nd in every folder. With

22:19

the idea being at any point in

22:21

time, if you ask changes around me

22:23

folder, the model should be able to

22:25

look up what's the nearest place where

22:27

there's an architecture written down that it

22:29

can. Sort

22:31

of on the technical side, the thing

22:33

to understand the models are much faster reading

22:35

tokens than humans. And

22:38

like what are the magnitude faster

22:40

than sort of ingesting these tokens? But

22:44

humans have, for example,

22:46

like some small things memorized.

22:49

So there is obviously small differences between

22:51

how we code. the

22:54

model is starting from scratch every

22:56

time. So cursor tab in our code

22:58

base is named CPP for being

23:01

copal++ and the model will always sort

23:03

of needs to be reminded that

23:05

whenever you're searching for something that says

23:07

copal++ or something, what you actually

23:09

really need to, or whenever I say

23:11

cursor tab, you should actually search

23:13

for copal++ or something like that. So

23:15

there are these facts and rules

23:17

that are quite important. How,

23:20

I don't want the default to be, so

23:22

A, It would be better if everyone sort

23:24

of changed their way of coding. I

23:27

think the obviously better approach is who

23:29

we just figured out. We

23:31

should just spend all the

23:33

time and energy we need, all

23:35

the computing we need to

23:37

really nail down the architecture that

23:39

you have, really figure out

23:41

all the facts and rules. I

23:44

don't know if I have any interesting controversial ideas

23:46

for how that should be done. Someone

23:48

was joking that, you know,

23:50

maybe we should email you 10 rules in the

23:53

morning and you'll just like, he has an

23:55

L on the 10 rules and hopefully the leper

23:57

corpus over time. Like you want a system

23:59

that allows you to add rules and then prune

24:01

bad rules. Like sometimes there will be, like

24:03

if you just ask the model to look at

24:05

a PR and give you some rules, sometimes

24:07

it will come up with bad rules and you

24:09

need a way of proving them out. So

24:12

like what is the minimal

24:14

set of rules such that all

24:18

your PRs become much easier. There's

24:22

the model we need to look at

24:24

all of the rules. I mean, we're still

24:26

sort of figuring it out, but I

24:28

think I think there's something important at the

24:30

core of this that is both in

24:32

terms of how humans would change and also

24:34

in terms of what we should change

24:36

just to make the defaults much better, because

24:38

not every single person will change. Of

24:41

course, but for example, do you

24:43

think smaller file sizes are better because

24:45

the model can more easily navigate

24:47

the code hierarchy or do you think

24:49

that creates complexity? There's

24:53

always some trade -offs. The funny

24:55

joke is that sometimes... people

24:58

will sort of keep adding to the same file

25:00

more and more until the model can't edit it

25:02

anymore, and then you just ask the model to

25:04

refactor that file for you, because you're just like

25:06

sort of, you know, in cruiser

25:08

terminology, you know, composing the file more and

25:10

more. And

25:12

it seems pretty clear to me that

25:14

there is obviously some advantage of

25:16

the model seeing all the complex element

25:18

to the current task in the

25:20

same file, and also that... For

25:23

future tasks, it'll be easier to

25:25

follow the smaller. I

25:27

think infrastructure -wise, we will also make

25:29

it possible for you to sync

25:31

all of these files to a remote

25:34

server. So we will have a

25:36

big enough copy of your code base

25:38

at some point. So right now,

25:40

we're extremely privacy -conscious, and that means

25:42

we try to make sure that we

25:44

never store any code past the

25:47

life of your request. Ideally,

25:49

in the future, we can store at

25:51

least some part of it in a

25:53

private way that allows the model to

25:55

very quickly do reliable edits. So you

25:57

shouldn't have to make the round trips

26:00

for making every single small edit that

26:02

feels quite bad. What

26:05

else? You were telling me that

26:07

you run infrastructure. Also,

26:10

can you talk about what

26:12

the interesting infrastructure trade -offs

26:14

are at cursor? We

26:18

build lots of different pieces of infrastructure. There's

26:20

sort of the traditional company infrastructure, but then

26:22

there's also a lot of things. The

26:24

one that we've been sort of very public about

26:26

is our indexing infrastructure. We

26:28

spent a lot of time optimizing

26:31

and running at quite enormous

26:33

scales, like billions of files per

26:35

day kind of infrastructure. And

26:38

for that, we want our own

26:40

inference. So for all the models

26:42

that sort of embed your files, we

26:45

run you

26:47

know, an enormous amount

26:49

of, you see the really

26:52

large pipeline, so like if

26:54

you're some big company and you

26:56

have like 400 ,000 files or

26:58

500 ,000 files, you want the

27:00

ability to vol the user's

27:03

coding effectively feel like it's being

27:05

instantly synced across to the

27:07

server, vol the model is using

27:09

the embeddings to like search the code base

27:11

or edit the code base, et cetera, et

27:13

cetera. So

27:16

scaling that was being quite a

27:18

challenge and I think there's been

27:20

this broad category of databases that

27:22

are being built on top of

27:24

S3 and we're like a big

27:26

believer in this approach of you

27:29

should build your database slash. I

27:32

don't think there's like a

27:34

like the usual term is

27:36

sort of separation storage and

27:38

computer disaggregated storage databases, but

27:41

so the class example, this

27:43

is. We

27:46

use TurboPuffer. The TurboPuffer

27:48

stores most of the

27:50

vectors on an S3

27:52

sort of path, and

27:55

then they have a write ad

27:57

log, and you sort of write

27:59

this write ad log. The write

28:01

ad, there's some compaction process, it

28:03

compacts the write ad log back

28:05

into the database. And

28:09

then there's sort of new challenges

28:11

we've been dealing with with this

28:13

indexing infrastructure. So we've been thinking

28:15

about Is there a way in

28:17

which you can support shared code

28:19

bases? So, you know, all the

28:21

people that wait in biases have

28:23

a really big code base. Hopefully

28:26

in the future, you know, you

28:28

will be able to spin out

28:30

background models, editing your code base.

28:32

And so, you know, we want

28:34

thousands, if not tens of thousands

28:36

of sort of... clients that are

28:39

connecting to that codebase, and we

28:41

don't want to have 10 ,000

28:43

copies of the Waste and Vices

28:45

codebase for most of which are

28:47

not being utilized. So

28:49

couldn't we have a shared truck, and

28:51

then every single person can have their

28:53

branch off that trunk? That

28:56

architecture is still, we're

28:59

working on it. It's not exactly

29:01

easy to do because How

29:04

do you easily branch this vector database? At the

29:06

end of the day, you want to be able to

29:08

query both the trunk and your section and merge

29:10

them in a way that you still get the correct

29:12

top K chunks. That's not trivial.

29:15

So when I fire up cursor, it's like

29:17

quietly indexing all the files that are

29:19

in my project. So we try

29:21

to, yes, exactly. So when you fire up

29:23

cursor, it quietly indexes every single thing, whether

29:26

as long as you both A

29:28

allow us and if it's default

29:30

turned on. One really popular

29:32

cursor use case is like you open up a

29:34

GitHub repo, you clone it, and then you

29:37

fire up cursor in that GitHub repo, and now

29:39

you can quickly ask questions about it. And

29:41

we try our best to make it effectively instant

29:43

to you. You index these really, really large

29:45

code bases. Obviously,

29:47

if you clone LLVN, which

29:50

is 120 ,000 files, that will

29:52

take us a bit longer. So

29:56

for example, an interesting

29:59

infrastructure question for... The

30:01

listeners or whoever you guys like pondering

30:03

about is like, should you, how should we

30:05

allocate these token capacity? So we at

30:07

any point in time have a fixed number

30:09

of GPUs, which means we have a

30:11

fixed amount of token capacity. you

30:17

know, you want to index LVM or weights and

30:19

biases and that's a really large code basin and

30:21

there's a bunch of people that have a number

30:23

of small code bases. Should the number of small

30:25

code bases always be allowed to go through and

30:27

you should be slow or should you take a

30:29

lot of the capacity in the beginning and everyone

30:31

else gets like a smaller chunk in the, in

30:33

the optic types, you shouldn't get

30:35

a really bad experience. And that

30:37

kind of question is still sort of

30:39

hard to run skip. Well,

30:42

how do you think about that? Currently,

30:46

we try to keep both

30:49

sides relatively happy. So

30:51

you can boost up your capacity up

30:53

until the next thing, but I'm still looking

30:56

for better answers. I think we didn't

30:58

spend that much time thinking about it, but

31:00

hopefully there's a really good answer to

31:02

how to make people happy. There's

31:05

no serverless GPUs, right? There's no

31:07

great serverless option. Because at the end

31:09

of the day, the amount of

31:11

computer spending is still fixed. the

31:15

amount of compute is just like the amount of computing

31:17

that's your code base plus the amount of compute for

31:20

like every single other person that we're indexing. So

31:22

in an ideal world that'd be this

31:24

phenomenal, marvelous thing where you could

31:26

boost up your capacity and then, you know,

31:29

people can use that capacity and we could

31:31

boost it down again, which is

31:34

what would happen in CPU land and that

31:36

sort of infrared has not been built for

31:38

GPU land. Is indexing the

31:40

main thing that your GPUs are

31:42

doing? Because you're also running lots of

31:44

models, too. Yeah, yeah. So we

31:46

run the tab model. Indexing is a

31:48

very small percentage of our defuse.

31:50

I mean, we run the tab models,

31:52

and hopefully we'll be running much

31:54

larger models in the future. And yeah,

31:56

they far and away dominate most

31:58

of the compute cost. I see.

32:00

This is the model running the tab models. Yeah.

32:03

So tab models, like hundreds of millions of calls per

32:06

day. Big

32:09

models we're running

32:11

have thousands of

32:13

requests. Without

32:17

going into detail, there's thousands of requests going

32:19

on. We're scaling

32:21

up these models as fast as we can. They

32:27

definitely take up far more compute. It

32:30

makes sense also, like larger. One

32:33

inch we should have is again, you're doing tens

32:37

of thousands of tokens of inference

32:39

per keystroke per person, which is both

32:41

really cool and also really scary

32:43

if you're running the inference. Obviously

32:46

caching really helps, but

32:49

it's still scarier

32:51

than running a

32:53

server. Has

32:56

there been any surprises as

32:58

you've scaled up this ML infrastructure?

33:01

You've got to be one of the fastest scaling. ML

33:04

companies ever like have there been

33:06

any like kind of pitfalls or

33:08

like I don't know like what's that

33:10

experience been like smooth um do

33:12

you think glitches uh but I think

33:15

like again the team is really

33:17

really talented and uh we've sort of

33:19

gotten over it nice um What

33:21

about, I mean, we're talking like, you

33:23

know, maybe two weeks after Deepseek

33:25

came out and then obviously caused investors

33:28

to like change their mind about

33:30

your video stock. Did it like update

33:32

your beliefs at all? been really

33:34

weird to me because I think we're

33:36

like both on the lex pod,

33:38

but also before that we've been pretty

33:41

public about using Deepseek in many

33:43

ways and we used to use their

33:45

1 .5 series models and then switched

33:47

over to their V2 series models.

33:49

So it was like big shock to

33:51

me that like everyone was sort

33:54

of like going, this is some

33:56

new thing, you know, they've been producing

33:58

phenomenal work for a while. Their

34:02

models, like I used to joke, like

34:04

they were one of the three or four

34:06

or five companies that you would trust.

34:08

You like produce good models where like the

34:10

numbers wouldn't feel like they were juiced

34:13

up. in a way that there were

34:15

certain models that felt like their numbers had

34:17

been a little bit too two -step, like by

34:19

two -step, I mean, like they were really high

34:21

for evaluations. But then if you

34:23

use the model in practice, you would never

34:25

like using the model. It's just very specific to

34:27

some of the evaluations. But

34:30

DeepSeq, I felt like, was very

34:32

honest about things and has been

34:34

producing really good models. So we've

34:36

been running DeepSeq v2 model for

34:38

eight or 10 months now, probably

34:41

12 months, something like that. on

34:44

our own

34:46

inference. That's the

34:48

model we've scaled up to hundreds of millions of

34:50

calls. Interesting. How

34:52

did you choose it? Was it just

34:54

the best? We knew it was the

34:56

best. They had been producing extremely good

34:58

open -code models. We have our

35:01

own post -training stack and we

35:03

do our own stuff. But

35:05

for just picking a really

35:07

well -pre -trained base, DeepSeq

35:09

does a phenomenal job. The

35:12

data they train on is really good, and

35:14

the model is both quite knowledgeable, quite smart,

35:17

and also quite cheap to run for

35:19

the tab in particular. And

35:21

I think in general, I'm really

35:23

excited about DeepSeq v3. I think DeepSeq

35:25

v3 is actually a really well -pre

35:27

-trained base for large multi. And

35:30

I suspect it will be

35:32

very, very useful for making

35:34

these custom applications. So

35:39

you obviously launched

35:41

agents, and it's

35:43

pretty cool, but it's also kind of

35:45

contained in how many iteration steps

35:47

the agent will do and things like

35:49

that. Where do

35:51

you see agents? You

35:54

know going in there to I mean like

35:56

obviously in for this to get a lot

35:58

cheaper It seems like he'd go much broader

36:00

if you wanted to like like what are

36:03

you thinking? Super focused on it. I think

36:05

as people have been getting better at doing

36:07

RL the model of getting better at both

36:09

thinking and also being extremely coherent So I

36:11

think one of the things that is talked

36:13

about lessons up here at the models have

36:15

gotten over producing tens of thousands of tokens

36:17

of output which They were not before I

36:19

think they would sort of go into delusional

36:21

load after a couple thousand tokens immediately and

36:23

now they've gone quite a bit more coherent.

36:26

And that comes from doing

36:28

RL and really, really good

36:30

post -training. And

36:33

I think agents were bottlenecked

36:35

by that particular aspect of

36:37

coherency. One of the

36:39

things that makes this audit experience

36:41

really magical for using in an agent

36:43

is that it's so coherent over

36:45

such a long period of time, like

36:48

over tens of tool calls and you

36:52

know, I suspect as the tasks get

36:54

harder and harder, you would need to

36:56

be learned over hundreds, if not thousands

36:58

of tool calls and working on it.

37:02

One of the things that I think transcendentally, like

37:04

again, like back to the mission of the

37:06

company, the mission of the company is sort of

37:08

what is the... We want to automate as

37:10

much of coding as possible and while still having

37:12

the developer in the front seat. And

37:15

automating coding in the short

37:17

term involves, you know,

37:21

allow developers to just like, in

37:23

the cases where they want to sit back and let

37:25

the model code and doing that, but in the cases where

37:27

they want to drive the editor to like, make

37:29

code, like, I don't know, you're doing voice

37:31

and bias things and you want to like

37:33

switch your GRPC thing to some other TLS

37:35

package in Rust, like you should just be

37:37

able to tell the model like, I want

37:39

to switch my GRPC thing to, to use,

37:42

you know, Rust TLS instead of something else.

37:44

And the model should just get it and

37:46

be able to make these large scale code

37:48

-based flight changes. And

37:50

that requires the model to have

37:52

some agent type things, because you're never

37:54

going to sit down and write

37:56

out exactly the spec of your clippies.

37:58

Then the thing that the agent

38:00

really helps with is you don't have

38:03

to sit down and explain like, yeah,

38:07

we are 1DB, we make

38:09

this. We

38:11

have a backend that is written in

38:13

Rust and Go. The Rust hooks up

38:15

to go in this way. For our library,

38:17

we use this and model should just go and figure

38:19

it out. My

38:21

own experience of playing with agents,

38:23

which is much diminished compared to

38:25

yours, is that when it breaks,

38:28

it's the challenge to debug. Have

38:30

you built any systems internally for just

38:32

even looking at, okay, what is agent

38:34

doing? Why did it get in a

38:36

weird loop here? What's happening? How

38:38

do you visualize that? Oh,

38:41

we're building our own infrared for now.

38:43

I suspect that there will be phenomenal

38:45

products in the future that will make

38:47

this much easier. For

38:50

now, the same thing

38:52

with building props. So we used

38:54

this internal library called Quiant. And

38:58

the way we built Quiant was it was

39:00

well suited to our own need to design. And

39:02

I think for the same user agent infrastructure,

39:04

we'll be building our own infrastructure in the short

39:06

term. And I suspect in the long term,

39:08

there'll be on

39:11

what phenomenal, you know, DevTools that

39:13

will come up to make it much

39:16

easier to both inspect the chains, be

39:18

able to stop at any point and

39:20

restart the chains, be able

39:22

to debug them in production when something weird

39:24

goes wrong, all sorts of things that you

39:26

would need to be able to run like

39:28

a production system at scale. Is

39:31

the agent evaluation like more also

39:33

like a, it sounds like it's more

39:35

of a vibes -based approach than like

39:37

specific metrics? Yeah, so it's pretty

39:39

clear why it's based. I suspect

39:41

it'll be wife's face in the short

39:43

term and become, as we get

39:45

better at shipping these, it'll become more

39:47

and more sort of deli metrics

39:49

and you'll be much more operational with

39:51

it. When

39:53

you look at like something like

39:55

a Devon or like these

39:57

sort of like completely automated, like

40:00

no program or like approaches,

40:02

do you view that as like

40:04

competitive or interesting or like,

40:06

what is your... I think we're

40:08

interesting and... the medium term,

40:10

if you can actually take your

40:12

hands off and let the

40:14

model drive your entire editor or

40:16

let the model drive the

40:18

entire editing process, I am

40:20

totally open to it. But

40:23

in the case where it's not

40:25

really useful and boring and not

40:27

really that fun, we

40:30

just wait. We

40:33

just what just wait we just wait

40:35

until it gets good enough like we keep

40:37

training the models and at some point

40:39

it will get good enough and then it

40:41

will be really fun to use I

40:43

think in general over a one to two

40:45

year time train I expect that the

40:47

way people will code will change and I

40:49

don't think people I think in the

40:52

short term that seems really scary, but I

40:54

think it'll be this gradual process and

40:56

it'll be extremely natural to everyone coming in

40:58

that the way the way coding is

41:00

changing you I think, for

41:02

example, the train from not having

41:04

a co -pilot to a co -pilot

41:06

was extremely natural in retrospect. It

41:09

was not something that was scary

41:11

to anyone. It was this thing

41:13

that predicted the other thought, and

41:15

you were like, wow, this is

41:18

phenomenal. And you just started losing

41:20

it. And then the train

41:22

from going from this co -pilot to

41:24

this. you

41:26

know foreground agent interface where the model

41:28

that does edits across multiple different files and

41:31

you're like oh I want to switch

41:33

this to use Rust TLS and I want

41:35

to you know make sure that you

41:37

always use HTTP2 and blah blah blah like

41:39

the model gets it and it reads

41:41

all the files and it makes the changes

41:43

and you can immediately review the changes

41:45

very quickly and tell that are correct and

41:47

that was also pretty natural I don't

41:49

think there was any way in the middle

41:51

where like people felt like disoriented and

41:53

I think that's sort of going

41:55

to background things, it'll be, all

41:59

these things are always, you know, more, more

42:01

gradual than one would expect. You would have

42:03

expected in 2020 that like if I said,

42:05

the way you'll be coding is you sort

42:07

of start talking to the computer and it'll

42:09

make changes to random files and you'd be

42:12

like kind of freaked out. You'd think, oh,

42:14

it's going to add all these bugs. It's

42:16

going to be impossible to review. Like I

42:18

really enjoy coding. Why the fuck am I

42:20

doing this? Yeah. That like all,

42:22

all these things would have seemed scary. and

42:24

yet then a five years in, four

42:26

years into the language model, you

42:30

know, language model journey of

42:32

products, like things feel quite

42:34

natural. So like 2021 is

42:36

called Paula 2025, we're

42:38

in now. And at any

42:40

point in time, you know, making the

42:42

change has not felt very disoriented, which

42:44

like maybe in one step it would

42:47

have, but right now it's not really that

42:49

disorienting. Well, it feels like a

42:51

lot of fun to me. I mean, like it's

42:53

like, I guess like when I like connect the dots

42:55

from like 2020 to now. It's

42:57

gone better. It's like, it's, it's,

42:59

it's gone better, right? It's like, yeah,

43:02

I guess we're like, I'm going,

43:04

it's like, you know, when I, when

43:06

I look a few years out,

43:08

it's, I have no idea, but it's

43:10

hard not to see that like

43:12

a world where you wouldn't really be

43:14

doing anything that looks like programming

43:16

a few years out, right? Or more

43:18

people will be coding, more people

43:20

will be making much more difficult things.

43:23

like things that are considered much

43:25

more difficult, be it lower

43:27

level things, be

43:29

it larger

43:31

projects, even for their

43:33

side projects. I think people

43:35

are usually very conservative with their side projects because

43:38

they're like, oh, you know, I probably won't have that

43:40

much time. I think people will

43:42

get much less conservative with these side projects.

43:44

I'm generally just extremely optimistic in the medium

43:46

term. Yeah, yeah. Do

43:49

you feel like at all

43:51

like... I

43:54

mean, I guess, first of all,

43:56

don't you think it's a totally different

43:58

world where everyone can do these

44:00

monster side projects easily? That seems

44:02

like software is a very different

44:04

feeling. Even doing

44:06

a software company seems like it

44:08

might be hard to have a

44:11

protected advantage as much, right, when

44:13

it's easy to build this stuff? I

44:16

can't philosophize over that.

44:18

I'm not really scared of

44:21

people having medium -sized I

44:23

can't think of these things

44:25

as like experimentation becomes much more,

44:28

much more natural. I think a lot

44:30

of the things that large changes

44:33

are usually scary at companies because a

44:35

large change requires changing so many

44:37

pieces and changes so much time that

44:39

you want to plan out everything

44:41

upfront. and then planning is really hard

44:43

because you can't really foresee how

44:45

your production system will look if you

44:47

do XYZ, then everything becomes much

44:50

more scary and then you add more

44:52

meetings and it becomes more formal

44:54

and then everything just becomes worse and

44:56

worse over time. I understand

44:58

it, right? If you're doing monkey year

45:00

database transition, boy, do you want to

45:02

plan out every single small detail and

45:04

then you want to argue over every

45:07

single small detail. But if

45:09

you can start prototyping these things

45:11

really quickly, Maybe

45:13

it becomes less talking, more coding.

45:16

You have much cleaner concrete

45:18

artifacts. If

45:20

you're in PyTorch and you want to do a

45:22

small API change in PyTorch, it'll take a journey.

45:24

You probably want to debate out the hell out

45:26

of it. If you're in

45:28

PyTorch and you can have

45:30

a prototype in three days, maybe

45:33

you should just argue with the

45:35

prototype now. Is that how

45:37

you do things at cursor? Hopefully

45:41

more and more. So yeah, I

45:43

mean, there are still things that are

45:45

scary, but Definitely, I think I

45:47

found myself thinking it's just much better

45:49

to argue. I suspect that that

45:51

change will continue. Awesome Well,

45:53

I guess one one final question

45:56

if something comes to mind

45:58

when you think if you were

46:00

sort of outside of cursor

46:02

and kind of like fresh eyes

46:04

into this, you know kind

46:06

of world of AI applications and

46:09

LLMs that kind of working for so many different things.

46:12

Is there something else that kind of excites you

46:14

that you wish you had time to think about? Personally,

46:18

for me, I've always

46:20

wanted sort of like

46:22

a really good reading

46:24

experience. I

46:26

like to spend my time

46:29

sort of free time either reading

46:31

or spending time even reading

46:33

code bases. I think it's

46:35

sort of this underrated aspect of

46:37

coding that like all of us

46:39

produced some of these artifacts that

46:42

we've poured our many years of

46:44

our life into. Reddit,

46:46

someone has poured their life into Reddit.

46:48

And I really want to go read

46:50

and understand Reddit. What were the hard

46:53

decisions? What were the easy decisions? And

46:56

I think both for reading

46:58

books, for reading papers, and

47:00

for reading code bases, we

47:02

haven't discovered the final optimal

47:04

AI tool. I think

47:07

hopefully cursor will contribute to at least

47:09

reading code bases, but maybe, you know,

47:11

someone makes it easier to read books

47:13

or to read papers, I'll be really

47:15

happy. Like reading papers

47:17

is still like quite an arduous process. I

47:19

mean, PDF viewers, I don't love the current

47:21

PDF. You are still like you click a

47:23

thing and it'll jump into the final thing. It

47:26

feels like a lot more

47:28

primitive than it should be.

47:32

And, you know, I've recently been reading papers

47:34

by just pasting them into one

47:36

of these sort of chat

47:38

apps and things are getting

47:40

bad. I think in

47:42

general, it feels like there's a

47:44

lot of low -hanging food in lots

47:46

of different areas of life. Okay,

47:50

I got to ask it. What are

47:52

your top recommended reading code bases?

47:58

Well, as I just mentioned, Redis. Redis is

48:00

quite good if you haven't read it. It's

48:03

relatively small. And

48:06

still, it's

48:08

still quite fun. Probably that's

48:10

the one that I'd most recommend

48:13

people because it's the thing

48:15

that is used by everyone and

48:17

it's just really, really well -written.

48:19

SQLite, for sure, also, if

48:21

you haven't read SQLite. Again,

48:25

very well -written. It's this coherent document

48:27

by a very small number of people. And

48:30

then I think most of the others recommend like

48:32

software that you use, you should. try to go

48:35

read the software that you use. I

48:37

mean, some things are harder, but like I did. If

48:39

you're a fan of ghosty, the terminal, maybe

48:41

you should go, go spend a weekend trying to

48:43

read ghosty, or like if you're a fan

48:45

of PyTorch, maybe you should go look into why

48:47

PyTorch does what it does. I

48:49

think there's a lot of choices

48:51

that you can sort of criticize on

48:53

the outside and people under appreciate

48:55

the tremendous amount of work that people

48:57

say on the PyTorch Steam have

48:59

put in to make PyTorch like really,

49:01

really easy for you to use.

49:03

And there's a magical experience where all

49:05

the sort of gradients flow naturally

49:07

that has taken many tens of thousands

49:09

of engineering hours. I don't know

49:11

if it's in the hundreds of thousands

49:13

of millions, but it's like a

49:15

lot of engineering hours. Interesting.

49:19

Well, thank you so much. I really appreciate your time. Thanks

49:23

so much for listening to this episode

49:25

of Gradient Descent. Please stay tuned for future

49:27

episodes.

Rate

Get this podcast via API

From The Podcast

Gradient Dissent: Conversations on AI

Join Lukas Biewald on Gradient Dissent, an AI-focused podcast brought to you by Weights & Biases. Dive into fascinating conversations with industry giants from NVIDIA, Meta, Google, Lyft, OpenAI, and more. Explore the cutting-edge of AI and learn the intricacies of bringing models into production.

Join Podchaser to...

Rate podcasts and episodes
Follow podcasts and creators
Create podcast and episode lists
& much more

Episode Tags

Do you host or manage this podcast?
Claim and edit this page to your liking.

,

Unlock more with Podchaser Pro

Audience Insights

Contact Information

Demographics

Charts

Sponsor History

and More!

Pro Features

Resources
Help Center
Blog
API

Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More