Episode Transcript
Transcripts are displayed as originally observed. Some content, including advertisements may have changed.
Use Ctrl + F to search
0:01
You're listening to Grady of Descent, a
0:03
show about making machine learning work
0:05
in the real world, and I'm
0:07
your host, Lucas B. Wald. Swale
0:11
Asif is the CPO and
0:13
co -founder of Cursor, one
0:15
of the best loved and
0:17
most exciting and popular AI
0:19
products out there. It helps
0:21
you with coding, helps you
0:23
use LMS to do coding. I
0:26
use it all the time, and I really love
0:28
it. And I was just excited to ask him
0:30
about how he built such a great product. I
0:32
found his answers super interesting, and I hope you
0:34
enjoy this interview. All
0:38
right. Well, thanks so much
0:40
for taking the time to
0:42
talk. I guess maybe this is
0:44
a softball question, but I
0:46
was really interested in just hearing
0:48
the story of Curster, like
0:50
how you started it. what
0:53
the moment was where it really started to
0:55
take off because, you know, now it's like one
0:57
of the most loved products I think out
0:59
there. I
1:01
mean, history comes from,
1:04
we had been
1:06
really interested in sort
1:08
of scaling laws
1:11
and back in college,
1:13
sort of, I had gone on and worked
1:16
on a sort of search engine type company
1:18
with a friend, and there
1:20
we were really bullish on language models, because
1:22
it felt like language models could
1:25
really compress all the world's information, and
1:27
there should be this end -to -end
1:29
index of searching the internet. Instead
1:32
of many of the heuristics we have coded
1:34
in over the years, it felt like
1:36
you could sort of, that should be
1:38
the end -to -end way of doing things. scaling
1:44
laws, doing the search
1:47
engines, training large models at the
1:49
time. I think CoalPallet was the first
1:51
really big moment for us, where
1:53
it was this project that was truly
1:55
magical. It was fast. It
1:58
felt like kind of mew you. But
2:01
then CoalPallet did not improve much over
2:03
the coming year or two. And
2:06
for us, when we saw a Gping floor,
2:08
we thought the ceiling floor, what
2:10
a really, really great product. was
2:14
possible at that
2:16
moment was really
2:18
high and Then it was like
2:20
pretty clear that like as the models
2:22
got much better like you know skill
2:24
and loss progress Models get much better
2:26
the product that can be built in
2:28
the future is even even higher
2:30
ceiling and that was
2:32
like You just it was
2:34
this sort of super attractive thing to go
2:36
to it. And you know, we're all the
2:38
coders at our 10 we
2:42
wanted to be building things that we use every day. And
2:45
you know, cursor was originally built
2:47
for ourselves in many ways. It
2:49
was, and it
2:52
was sort of fun seeing that, you know,
2:54
everyone else really liked it. It was definitely built
2:56
for ourselves. And
2:59
we were sort of experimenting. So a lot of
3:01
the early culture of the company was experimenting
3:03
with various different ways of using the models. Did
3:05
there be a document that you're sort of
3:07
typing things out and the model is coding things?
3:09
Should there be? If
3:12
you want to do this next
3:14
action prediction of
3:16
you're in a location, what should be
3:18
the edit? Maybe the model should be telling
3:20
you where to go next. You should be
3:23
able to make edits over your entire
3:25
repository. Some
3:27
of those things have
3:29
taken a year, a year and a
3:31
half, several iterations,
3:33
and some of them
3:36
we've continued building on. Now,
3:38
some of the core parts of the product
3:40
is this next action prediction thing, where
3:43
it predicts your next
3:45
edit at the correct location and then where you
3:47
should be going next, and
3:49
that people really, really love that feature. Then
3:52
we're working our way towards,
3:55
you should just be able to make any edit you
3:57
want across the entire repository, like
3:59
code -based wide. Obviously,
4:02
there are some products along the way that we'll talk
4:05
about. some
4:07
easy, some, you know, sort
4:09
of still quite difficult. Like model
4:11
solos struggle with what exactly
4:13
the architecture of the repository is.
4:15
If you, you know, ask,
4:17
what is the architecture of the
4:19
repository that is really quite
4:21
difficult because it requires sort of
4:24
looking at potentially billions of
4:26
tokens, tens of billions of tokens
4:28
and say, asking the question,
4:30
what is really going on as
4:32
opposed to like, you could
4:34
like list the function aids, right? Like. But
4:38
that doesn't really tell you, you know, that no
4:40
one is exactly going on. Well, totally,
4:42
I want to dive into that as much as
4:44
you're comfortable sharing. But I guess I wanted to
4:46
ask you, you know,
4:48
one of the surprising things that I learned
4:50
in my, you know, background research on you
4:52
is I think you guys came from using
4:55
VIM, not VS Code. Is that, is that
4:57
right? All of us were
4:59
really early users of VIM. We did eventually,
5:01
you know, had used VS Code. Probably
5:03
the last one too, there was a couple
5:05
of us. Aman and Arvid probably were
5:07
the last to switch over from WIM to
5:09
VS Code and the trigger there was
5:11
get a copilot. Oh, I
5:13
see. So get a copilot actually
5:15
pulled you over in the end.
5:17
So I had switched over before,
5:20
but then Aman and Arvid only
5:22
switched over after get a copilot
5:24
became. It was just the kinder
5:26
feature, right? In some ways, it was the
5:28
kinder feature. Totally. And why
5:30
doesn't something like Vim actually have
5:32
something like what you guys built?
5:34
It seems like a lot of
5:36
smart coders like to use it.
5:39
Is there something about it like a
5:41
graphical interface that lends itself to
5:43
this kind of structure, like coding with
5:45
an AI? I
5:47
think for us, VS Code
5:50
is, for one, it's pretty
5:52
clear the most loved platform on
5:54
the internet. or
5:56
for coders, it's the thing that
5:58
sort of is the factor. It's the
6:00
default. And
6:03
we wanted to sort of
6:05
incrementally evolve it towards the
6:07
world where you're starting to
6:09
automate coding. And
6:11
the cursor of one year from now should
6:14
look very different from the cursor of today,
6:16
which means almost by default, it should
6:18
not look exactly like the S
6:20
code. In
6:23
looking very different, you wanted to
6:25
start to replace where you didn't want
6:27
to have a text box to
6:29
code because coders still want to type
6:31
characters. You
6:33
want to be able to edit your
6:35
entire repository at a higher level, but
6:38
at some point, if you
6:40
find that there's a change that you can
6:42
quickly execute in 10 keystrokes, we want to let
6:44
you be able to dive into the details. At
6:47
any point in time, you're immediately
6:49
editing some pseudocode representation. maybe
6:53
a year from now, right? Like
6:55
humans are editing a pseudocode representation. And
6:59
that's really quick to add in and the
7:01
model is sort of working for you in
7:03
the background, but you're writing some kernel and
7:05
you want to go in and talk about
7:07
some of the indices. It's much easier to
7:09
do it by hand. Then you always, I
7:11
think developers will want this ability to go
7:13
in and, you know, unless
7:16
we truly believe that everything is going away,
7:18
it's like you really, really want the fine
7:20
-grained control. Yeah, yeah, that
7:22
makes sense You know one thing that
7:24
that strikes me from what you were saying
7:26
earlier about Like observing that co -pilot was
7:28
really great and there's all these you
7:30
know all this opportunity and how do you
7:32
kind of work with these AI models
7:35
is I think a lot of people other
7:37
people thought that at the same time
7:39
so you know like you know you had
7:41
this idea that I think of many
7:43
people had including a bunch of like YC
7:45
companies and other products that I saw
7:47
and it seemed like cursor emerged
7:50
as kind of the winning one among
7:52
these, right? So it seems like there was
7:54
great product execution here, which I'm always
7:56
really interested in. Like, do you have a
7:58
sense for what you were doing differently
8:00
than your competitors that made your product work
8:03
so well? Was it like, was it
8:05
like certain decisions or was it like a
8:07
process? You know, why questions
8:09
throw in really hard? I don't know. It's
8:13
very hard to tell exactly what we
8:15
did, right? I think there was a
8:17
bunch of things where We
8:19
always tried to push the ball
8:21
as much as possible without being...
8:23
We always wanted to be the
8:25
most useful product at any moment
8:27
in time. It's like at the
8:30
frontier. It's very
8:32
easy to overform as an underdeliver.
8:35
And a lot of what we have
8:37
tried to do is to... We
8:39
didn't ship the agent until we were
8:41
very confident it was something that
8:43
was really useful. And we
8:45
had probably done three agent
8:47
prototypes before that. we did
8:49
a shift because some version
8:51
of the model would just
8:54
lose track and you could
8:56
make something that was, could
8:59
help you in the short term
9:01
and really hurt what people think
9:03
of as a reliable product in
9:06
the long term. Maybe
9:08
that is part of it. It was always, but
9:10
then also I think we've been first to a
9:12
lot of the inventions that people really like. So
9:15
more recently,
9:19
the ability to jump to the
9:21
next location that should be
9:23
edited is something we've had for
9:25
closer to eight months, 10
9:28
months a year, and
9:30
we hopefully will release a much
9:32
more upgraded version of it soon that
9:34
will be quite a bit better. And
9:38
only recently, you know, other people have
9:40
tried to do that. So we've always
9:42
tried to be like, think
9:44
of what's coming and at least have a prototype. as
9:47
soon as we think it's something that should
9:49
be useful. There's the tap to jump feature saves.
9:53
There's the apply features yet. And
9:56
we also like, you know, we've done this at
9:58
scale also. So I think that that has benefited
10:01
where, for example, for
10:03
our custom tab model, we
10:05
do something like 100 million requests
10:07
a day and quickly growing
10:09
it. I think part of doing
10:11
it well has been able to do it
10:13
reliably for lots and lots and lots of
10:15
people. Do you
10:18
think that any of the data that
10:20
you have or the feedback that
10:22
you have from users as part of
10:24
your success, or are you more
10:26
making decisions through your own experience? The
10:29
data has definitely been
10:31
enormously useful. I
10:34
think the feedback loops
10:36
that people consider obvious are
10:38
indeed extremely useful. You
10:40
want to be the company
10:42
that feeds
10:44
an extremely good product
10:46
that everyone loves, and
10:48
then that definitely helps in making the next version
10:50
even better. It helps in
10:52
small ways, it helps in training
10:55
models, it helps in, yeah. Where
10:59
the small ways are, you understand
11:01
how people are using your product, what
11:03
is the most important thing to
11:05
ship at any moment, and then in
11:07
big ways in training models and
11:09
improving just the core workflows. For
11:12
example, technically speaking, one loop
11:14
in the apply use case
11:16
is, you're
11:18
creating your first version of an apply that
11:20
is quite a bit bigger, and
11:22
you then deploy it for all users, you
11:25
get lots and lots of data, and then
11:27
you can distill a slightly smaller model. That
11:29
gets faster, the people use it even
11:31
more, and you then distill an even
11:33
smaller model, and you can keep compressing the
11:35
models down because you're generating the data that allows
11:37
you to do that. But also,
11:42
Yeah, it's just it's this like feedback loop
11:44
and then the yeah, some of the
11:46
things get faster So again for now up
11:48
to like I don't know a thousand
11:50
or two thousand line file apply feels effectively
11:53
instant and that's what we wanted to
11:55
feel right right we wanted to feel like
11:57
applies this Deterministic, you know, they figured
11:59
out some deterministic algorithm to like place the
12:01
box, but that's not actually what's happening
12:03
It's a model that is actually group aiding
12:05
the entire file and a lot of
12:07
you know A lot of the
12:09
improvements have been making the model smaller.
12:12
There's obviously improvements and just making the
12:14
inference much faster when doing these speculative
12:16
edits. For
12:18
something like the agents that
12:20
you talked about, you had some
12:23
iterations where it wasn't useful
12:25
enough to ship. How did
12:27
you know that it wasn't good enough to ship?
12:29
How did you think about that? I didn't use
12:31
it on a daily basis. I think these things
12:33
are really quite easy to figure out. If,
12:36
you know, you're coding 10 hours a
12:38
day in cursor, right? Like you're, you boot
12:40
up the editor, you're making the improvements,
12:42
and you're seeing it on daily basis. If
12:45
the devs themselves can't don't use it
12:47
every single day, it's probably not something
12:49
that everyone else will want to use.
12:51
I mean, there's obviously corner cases to
12:53
this thing, where we're not the perfect
12:55
coders, but, you know, a thing like
12:57
an engine is such a general feature
12:59
that if you're not using it, it's
13:01
almost certainly not useful. That actually leads
13:03
me to another question I had, which
13:06
is you and your co -founders have
13:08
this background in sort of competitive coding,
13:10
right? Like, you know, does that, do
13:12
you think that's an advantage for you? Because
13:14
I could imagine that that might sort of
13:16
put you at sort of the forefront of
13:19
like wanting to be efficient in coding. But
13:21
I could also imagine that you might have
13:23
any asyncrasies in the way that you want
13:25
to write code that might be different than
13:27
your general user. I think we're not only
13:29
competitive coders. Like, we did competitive math and
13:31
coding because, you know, that's sort of part
13:33
of the background. It's always
13:35
really hard to distinguish, you know,
13:37
what part of your identity is
13:39
the most important, but many of
13:41
us had worked at sort of
13:44
software companies before, Stripe and the
13:46
like. And you
13:48
had some idea that production coding was
13:50
very different. And
13:52
then people had actually be built products.
13:54
So I think Michael had spent
13:56
quite a bit of time building these
13:58
high performance schemes. And,
14:01
you know, we have done modeling work. So
14:03
we had seen quite a wide variety of coding.
14:07
So bringing back to, like, just
14:09
competitive programming really, really affect
14:11
how you did coding on a
14:13
day -to -day basis. Like, not
14:15
really. It was
14:17
just, I think we knew what engineering
14:19
was. We were sort of doing day -to
14:21
-day engineering. And you could see if
14:24
the agent was helpful. And in this
14:26
case, it was very clear that, for
14:28
example, early iterations were not really that
14:30
useful. It was very slow. One
14:33
of the most important things that had
14:36
changed there is the length of the
14:38
context windows that you can do on
14:40
every single keystroke on every single request.
14:43
When the model started out, you would
14:45
start doing these 4k, 8k context windows.
14:48
Even if the model slightly supported them,
14:50
the models were not very good at
14:52
using the large context windows. Now
14:54
that you can easily do S that cross
14:56
-curve has gone down for the language models,
14:58
you can do requests in
15:00
the order of like, you
15:02
know, made hundreds of that,
15:04
like, you know, 50 ,000 tokens,
15:07
60 ,000 tokens reliably, that
15:09
has like enormously helped. One
15:12
way, one intuition to have here is
15:14
the model can't even feed your current file,
15:16
like it would not be very useful,
15:18
let alone like read the rest of your
15:20
repository or do searches or lots of
15:22
things that you expect a basic agent to
15:24
be able to do. That
15:26
wouldn't work if you're 8k tokens. 8k tokens,
15:28
what can you even fit in there? Another
15:32
interesting thing that you guys said in
15:34
your interview with Lux Freedom is that
15:36
you kind of wanted the experience of
15:38
using a code editor to be fun,
15:40
which I thought was kind of a
15:42
cool idea, like a little bit surprising,
15:45
right? It seems like such a utilitarian
15:47
thing. It kind of reminded me that
15:49
when I I remember
15:51
when I switched InCursor from
15:53
my default LM from Sonnet
15:55
to 01. Actually,
15:57
I think I started coding a little less.
15:59
I was actually having a lot less fun
16:01
because the latency was higher. It took me
16:03
a little while to realize that, but I
16:05
actually, for some reason, Sonnet had lower latency,
16:07
and it made it just a lot more
16:09
fun. I was like, you know what? I
16:11
just need to go back to the LM
16:13
where I was just enjoying writing code. I
16:15
do actually relate to what you're saying, but
16:18
I'm curious how that idea of a
16:20
fun experience shows up
16:22
in your utilitarian feeling
16:24
application. I
16:26
think it just, I mean, there's
16:28
always this sort of end metric, right?
16:30
The end metric is how much we
16:33
are enjoying using the model. And it's
16:35
been very clear that we enjoy using
16:37
Sonic more than the one. And
16:41
part of it is there's a
16:43
few things. So one is, I
16:46
think, Sonic is extremely is
16:48
even at scale, like, reliably
16:50
quite fast. And
16:52
I think we want to ship
16:54
models that are even faster,
16:56
that are better than Sonnet, that
16:58
are much longer context windows, that
17:01
could be, you know, edits reliably over
17:03
a much larger set of peer codebase,
17:05
for exactly the same reason, because it
17:07
becomes much more fun. So
17:09
it's, in some sense, it's this
17:12
hard to pin down feeling, but
17:14
in some sense, you know what
17:16
really affects it. Like, you will
17:18
get bothered if you have to
17:20
explain to the model again and
17:22
again what you're doing, or you
17:24
will get bothered if the model
17:26
doesn't really understand that you had
17:29
some easily viewed file open and
17:31
the model doesn't see it. And
17:33
that shows the Panoi. It's just
17:35
straight up a Panoi. So
17:38
you can turn it into some of
17:40
the tautical thing that you can track down.
17:43
But some of the inventions are just like, you
17:46
know, wouldn't it be more fun if blah,
17:48
blah, blah happened? Like, wouldn't it be more
17:50
fun if you were coding and the model
17:52
would just, once you started doing it the
17:54
factory, you could tap, tap, tap the entire
17:56
thing, like 10 tabs, what would that take?
17:59
And then once you think like, oh, you
18:01
know, 10 tabs would make me feel
18:03
really, really happy, you can then sort of
18:05
reverse and change the exact thing. Like,
18:08
the modeling works that you would have to
18:10
do, so like, what size of the model
18:12
you want to train, how
18:14
much time you want to spend sort of pre
18:18
-training, post -training, and RL -ing
18:20
the models to be able
18:22
to consistently do the same
18:24
behavior again and again. Another
18:29
concrete example is you
18:31
could always over -train the
18:33
tab models to be annoying.
18:37
So part of this, if you were
18:39
to only worry about making sure that
18:41
every single time it does the edit, you
18:44
would Overprit
18:47
like sometimes you really want to
18:49
like be writing against in kernel You
18:51
want to spend some time thinking
18:54
and you don't want the tab model
18:56
bothering you and that's the thing
18:58
you would only care about if you're
19:00
making it fun and enjoyable as
19:02
opposed to something that like Yeah, so
19:04
something that's like, you know, obviously
19:06
just always overpritting but this is a
19:09
pretty subjective experience
19:11
that you probably couldn't pull from
19:13
user data. How do you work through
19:15
that internally? Do you ever have
19:17
a difference of opinion with yourselves around
19:19
what's the more fun approach? Yes. I
19:23
think some of these transitions are subjective,
19:25
but I think if you think it
19:27
out, they're not always that controversial. Interesting.
19:30
At the end of the day, you're trying
19:33
it out. There's
19:35
always some intuition where you might over figure
19:37
in some direction, but for the most part. I
19:40
think there's not that much
19:42
argument, or is Sonnet more fun,
19:44
or is O1 more fun?
19:46
I mean, Sonnet is arguably varying.
19:49
Hopefully, there'll be more models that are
19:51
optimized towards keeping you in the
19:54
flow. I think you need
19:56
two categories of models. You
19:58
need the category of models that
20:00
is RL, towards being fast, and
20:03
super large context windows, and just
20:05
make edits across the entire probation.
20:08
Make you feel like you're breezing
20:10
through things. And you want a
20:12
category of models that is trained
20:14
for being extremely careful for reviewing
20:16
every single small thing before they
20:18
make the edit. Maybe
20:21
you do a bunch of research. They
20:23
didn't make the edit in the background for
20:25
you and then come back to you
20:27
with a PR. And in that case, the
20:29
thing that will be fun is if
20:31
they're more correct than not. And
20:34
then fast is not the only thing
20:36
that's fun. It's being correct or how
20:38
they write out. How did it prove
20:40
to you that they're doing the right thing? I
20:43
guess as you kind of
20:45
build a bigger brand and
20:47
you build trust with users
20:49
like me, why are
20:51
you even asking me what model I want
20:53
to use? I'm sort of aware of the
20:55
different models, but I would sort of trust
20:58
you more to know what's going to be
21:00
fun and useful for me. Yeah,
21:02
I think you're kind of right. Part
21:09
of building the trust was always showing
21:11
exactly what we're using and I think it's
21:13
you're probably correct that we should have
21:15
a default mode and You should use the
21:17
defaulting feel happy, but there's always if
21:19
you're if you're the kind of person that
21:21
wants to do mode and Want to
21:23
perfectly fine -tune every single thing you should
21:25
be able to do that and then there
21:27
should be the simple defaulting So there
21:29
should be a release in a week or
21:32
two that fixed all of this for
21:34
you. Oh
21:36
Here's something I've been wondering about myself
21:39
quite a bit. Do you
21:41
think that, do your best
21:43
practices in changing the structure
21:45
of my own code base
21:47
or the way that I
21:49
should code to make your
21:51
product work even better? For
21:53
example, we have one engineer that's
21:55
been sort of letting the LM
21:57
put in notes inside the code
21:59
base of helpful things to kind
22:01
of help understand the code base.
22:03
one of the things that we
22:05
did. We've been sort
22:08
of speculating about we don't
22:10
actually have a really correct solution
22:12
there, but this idea of
22:14
like maybe there should be a
22:16
readme .ai .nd in every folder. With
22:19
the idea being at any point in
22:21
time, if you ask changes around me
22:23
folder, the model should be able to
22:25
look up what's the nearest place where
22:27
there's an architecture written down that it
22:29
can. Sort
22:31
of on the technical side, the thing
22:33
to understand the models are much faster reading
22:35
tokens than humans. And
22:38
like what are the magnitude faster
22:40
than sort of ingesting these tokens? But
22:44
humans have, for example,
22:46
like some small things memorized.
22:49
So there is obviously small differences between
22:51
how we code. the
22:54
model is starting from scratch every
22:56
time. So cursor tab in our code
22:58
base is named CPP for being
23:01
copal++ and the model will always sort
23:03
of needs to be reminded that
23:05
whenever you're searching for something that says
23:07
copal++ or something, what you actually
23:09
really need to, or whenever I say
23:11
cursor tab, you should actually search
23:13
for copal++ or something like that. So
23:15
there are these facts and rules
23:17
that are quite important. How,
23:20
I don't want the default to be, so
23:22
A, It would be better if everyone sort
23:24
of changed their way of coding. I
23:27
think the obviously better approach is who
23:29
we just figured out. We
23:31
should just spend all the
23:33
time and energy we need, all
23:35
the computing we need to
23:37
really nail down the architecture that
23:39
you have, really figure out
23:41
all the facts and rules. I
23:44
don't know if I have any interesting controversial ideas
23:46
for how that should be done. Someone
23:48
was joking that, you know,
23:50
maybe we should email you 10 rules in the
23:53
morning and you'll just like, he has an
23:55
L on the 10 rules and hopefully the leper
23:57
corpus over time. Like you want a system
23:59
that allows you to add rules and then prune
24:01
bad rules. Like sometimes there will be, like
24:03
if you just ask the model to look at
24:05
a PR and give you some rules, sometimes
24:07
it will come up with bad rules and you
24:09
need a way of proving them out. So
24:12
like what is the minimal
24:14
set of rules such that all
24:18
your PRs become much easier. There's
24:22
the model we need to look at
24:24
all of the rules. I mean, we're still
24:26
sort of figuring it out, but I
24:28
think I think there's something important at the
24:30
core of this that is both in
24:32
terms of how humans would change and also
24:34
in terms of what we should change
24:36
just to make the defaults much better, because
24:38
not every single person will change. Of
24:41
course, but for example, do you
24:43
think smaller file sizes are better because
24:45
the model can more easily navigate
24:47
the code hierarchy or do you think
24:49
that creates complexity? There's
24:53
always some trade -offs. The funny
24:55
joke is that sometimes... people
24:58
will sort of keep adding to the same file
25:00
more and more until the model can't edit it
25:02
anymore, and then you just ask the model to
25:04
refactor that file for you, because you're just like
25:06
sort of, you know, in cruiser
25:08
terminology, you know, composing the file more and
25:10
more. And
25:12
it seems pretty clear to me that
25:14
there is obviously some advantage of
25:16
the model seeing all the complex element
25:18
to the current task in the
25:20
same file, and also that... For
25:23
future tasks, it'll be easier to
25:25
follow the smaller. I
25:27
think infrastructure -wise, we will also make
25:29
it possible for you to sync
25:31
all of these files to a remote
25:34
server. So we will have a
25:36
big enough copy of your code base
25:38
at some point. So right now,
25:40
we're extremely privacy -conscious, and that means
25:42
we try to make sure that we
25:44
never store any code past the
25:47
life of your request. Ideally,
25:49
in the future, we can store at
25:51
least some part of it in a
25:53
private way that allows the model to
25:55
very quickly do reliable edits. So you
25:57
shouldn't have to make the round trips
26:00
for making every single small edit that
26:02
feels quite bad. What
26:05
else? You were telling me that
26:07
you run infrastructure. Also,
26:10
can you talk about what
26:12
the interesting infrastructure trade -offs
26:14
are at cursor? We
26:18
build lots of different pieces of infrastructure. There's
26:20
sort of the traditional company infrastructure, but then
26:22
there's also a lot of things. The
26:24
one that we've been sort of very public about
26:26
is our indexing infrastructure. We
26:28
spent a lot of time optimizing
26:31
and running at quite enormous
26:33
scales, like billions of files per
26:35
day kind of infrastructure. And
26:38
for that, we want our own
26:40
inference. So for all the models
26:42
that sort of embed your files, we
26:45
run you
26:47
know, an enormous amount
26:49
of, you see the really
26:52
large pipeline, so like if
26:54
you're some big company and you
26:56
have like 400 ,000 files or
26:58
500 ,000 files, you want the
27:00
ability to vol the user's
27:03
coding effectively feel like it's being
27:05
instantly synced across to the
27:07
server, vol the model is using
27:09
the embeddings to like search the code base
27:11
or edit the code base, et cetera, et
27:13
cetera. So
27:16
scaling that was being quite a
27:18
challenge and I think there's been
27:20
this broad category of databases that
27:22
are being built on top of
27:24
S3 and we're like a big
27:26
believer in this approach of you
27:29
should build your database slash. I
27:32
don't think there's like a
27:34
like the usual term is
27:36
sort of separation storage and
27:38
computer disaggregated storage databases, but
27:41
so the class example, this
27:43
is. We
27:46
use TurboPuffer. The TurboPuffer
27:48
stores most of the
27:50
vectors on an S3
27:52
sort of path, and
27:55
then they have a write ad
27:57
log, and you sort of write
27:59
this write ad log. The write
28:01
ad, there's some compaction process, it
28:03
compacts the write ad log back
28:05
into the database. And
28:09
then there's sort of new challenges
28:11
we've been dealing with with this
28:13
indexing infrastructure. So we've been thinking
28:15
about Is there a way in
28:17
which you can support shared code
28:19
bases? So, you know, all the
28:21
people that wait in biases have
28:23
a really big code base. Hopefully
28:26
in the future, you know, you
28:28
will be able to spin out
28:30
background models, editing your code base.
28:32
And so, you know, we want
28:34
thousands, if not tens of thousands
28:36
of sort of... clients that are
28:39
connecting to that codebase, and we
28:41
don't want to have 10 ,000
28:43
copies of the Waste and Vices
28:45
codebase for most of which are
28:47
not being utilized. So
28:49
couldn't we have a shared truck, and
28:51
then every single person can have their
28:53
branch off that trunk? That
28:56
architecture is still, we're
28:59
working on it. It's not exactly
29:01
easy to do because How
29:04
do you easily branch this vector database? At the
29:06
end of the day, you want to be able to
29:08
query both the trunk and your section and merge
29:10
them in a way that you still get the correct
29:12
top K chunks. That's not trivial.
29:15
So when I fire up cursor, it's like
29:17
quietly indexing all the files that are
29:19
in my project. So we try
29:21
to, yes, exactly. So when you fire up
29:23
cursor, it quietly indexes every single thing, whether
29:26
as long as you both A
29:28
allow us and if it's default
29:30
turned on. One really popular
29:32
cursor use case is like you open up a
29:34
GitHub repo, you clone it, and then you
29:37
fire up cursor in that GitHub repo, and now
29:39
you can quickly ask questions about it. And
29:41
we try our best to make it effectively instant
29:43
to you. You index these really, really large
29:45
code bases. Obviously,
29:47
if you clone LLVN, which
29:50
is 120 ,000 files, that will
29:52
take us a bit longer. So
29:56
for example, an interesting
29:59
infrastructure question for... The
30:01
listeners or whoever you guys like pondering
30:03
about is like, should you, how should we
30:05
allocate these token capacity? So we at
30:07
any point in time have a fixed number
30:09
of GPUs, which means we have a
30:11
fixed amount of token capacity. you
30:17
know, you want to index LVM or weights and
30:19
biases and that's a really large code basin and
30:21
there's a bunch of people that have a number
30:23
of small code bases. Should the number of small
30:25
code bases always be allowed to go through and
30:27
you should be slow or should you take a
30:29
lot of the capacity in the beginning and everyone
30:31
else gets like a smaller chunk in the, in
30:33
the optic types, you shouldn't get
30:35
a really bad experience. And that
30:37
kind of question is still sort of
30:39
hard to run skip. Well,
30:42
how do you think about that? Currently,
30:46
we try to keep both
30:49
sides relatively happy. So
30:51
you can boost up your capacity up
30:53
until the next thing, but I'm still looking
30:56
for better answers. I think we didn't
30:58
spend that much time thinking about it, but
31:00
hopefully there's a really good answer to
31:02
how to make people happy. There's
31:05
no serverless GPUs, right? There's no
31:07
great serverless option. Because at the end
31:09
of the day, the amount of
31:11
computer spending is still fixed. the
31:15
amount of compute is just like the amount of computing
31:17
that's your code base plus the amount of compute for
31:20
like every single other person that we're indexing. So
31:22
in an ideal world that'd be this
31:24
phenomenal, marvelous thing where you could
31:26
boost up your capacity and then, you know,
31:29
people can use that capacity and we could
31:31
boost it down again, which is
31:34
what would happen in CPU land and that
31:36
sort of infrared has not been built for
31:38
GPU land. Is indexing the
31:40
main thing that your GPUs are
31:42
doing? Because you're also running lots of
31:44
models, too. Yeah, yeah. So we
31:46
run the tab model. Indexing is a
31:48
very small percentage of our defuse.
31:50
I mean, we run the tab models,
31:52
and hopefully we'll be running much
31:54
larger models in the future. And yeah,
31:56
they far and away dominate most
31:58
of the compute cost. I see.
32:00
This is the model running the tab models. Yeah.
32:03
So tab models, like hundreds of millions of calls per
32:06
day. Big
32:09
models we're running
32:11
have thousands of
32:13
requests. Without
32:17
going into detail, there's thousands of requests going
32:19
on. We're scaling
32:21
up these models as fast as we can. They
32:27
definitely take up far more compute. It
32:30
makes sense also, like larger. One
32:33
inch we should have is again, you're doing tens
32:37
of thousands of tokens of inference
32:39
per keystroke per person, which is both
32:41
really cool and also really scary
32:43
if you're running the inference. Obviously
32:46
caching really helps, but
32:49
it's still scarier
32:51
than running a
32:53
server. Has
32:56
there been any surprises as
32:58
you've scaled up this ML infrastructure?
33:01
You've got to be one of the fastest scaling. ML
33:04
companies ever like have there been
33:06
any like kind of pitfalls or
33:08
like I don't know like what's that
33:10
experience been like smooth um do
33:12
you think glitches uh but I think
33:15
like again the team is really
33:17
really talented and uh we've sort of
33:19
gotten over it nice um What
33:21
about, I mean, we're talking like, you
33:23
know, maybe two weeks after Deepseek
33:25
came out and then obviously caused investors
33:28
to like change their mind about
33:30
your video stock. Did it like update
33:32
your beliefs at all? been really
33:34
weird to me because I think we're
33:36
like both on the lex pod,
33:38
but also before that we've been pretty
33:41
public about using Deepseek in many
33:43
ways and we used to use their
33:45
1 .5 series models and then switched
33:47
over to their V2 series models.
33:49
So it was like big shock to
33:51
me that like everyone was sort
33:54
of like going, this is some
33:56
new thing, you know, they've been producing
33:58
phenomenal work for a while. Their
34:02
models, like I used to joke, like
34:04
they were one of the three or four
34:06
or five companies that you would trust.
34:08
You like produce good models where like the
34:10
numbers wouldn't feel like they were juiced
34:13
up. in a way that there were
34:15
certain models that felt like their numbers had
34:17
been a little bit too two -step, like by
34:19
two -step, I mean, like they were really high
34:21
for evaluations. But then if you
34:23
use the model in practice, you would never
34:25
like using the model. It's just very specific to
34:27
some of the evaluations. But
34:30
DeepSeq, I felt like, was very
34:32
honest about things and has been
34:34
producing really good models. So we've
34:36
been running DeepSeq v2 model for
34:38
eight or 10 months now, probably
34:41
12 months, something like that. on
34:44
our own
34:46
inference. That's the
34:48
model we've scaled up to hundreds of millions of
34:50
calls. Interesting. How
34:52
did you choose it? Was it just
34:54
the best? We knew it was the
34:56
best. They had been producing extremely good
34:58
open -code models. We have our
35:01
own post -training stack and we
35:03
do our own stuff. But
35:05
for just picking a really
35:07
well -pre -trained base, DeepSeq
35:09
does a phenomenal job. The
35:12
data they train on is really good, and
35:14
the model is both quite knowledgeable, quite smart,
35:17
and also quite cheap to run for
35:19
the tab in particular. And
35:21
I think in general, I'm really
35:23
excited about DeepSeq v3. I think DeepSeq
35:25
v3 is actually a really well -pre
35:27
-trained base for large multi. And
35:30
I suspect it will be
35:32
very, very useful for making
35:34
these custom applications. So
35:39
you obviously launched
35:41
agents, and it's
35:43
pretty cool, but it's also kind of
35:45
contained in how many iteration steps
35:47
the agent will do and things like
35:49
that. Where do
35:51
you see agents? You
35:54
know going in there to I mean like
35:56
obviously in for this to get a lot
35:58
cheaper It seems like he'd go much broader
36:00
if you wanted to like like what are
36:03
you thinking? Super focused on it. I think
36:05
as people have been getting better at doing
36:07
RL the model of getting better at both
36:09
thinking and also being extremely coherent So I
36:11
think one of the things that is talked
36:13
about lessons up here at the models have
36:15
gotten over producing tens of thousands of tokens
36:17
of output which They were not before I
36:19
think they would sort of go into delusional
36:21
load after a couple thousand tokens immediately and
36:23
now they've gone quite a bit more coherent.
36:26
And that comes from doing
36:28
RL and really, really good
36:30
post -training. And
36:33
I think agents were bottlenecked
36:35
by that particular aspect of
36:37
coherency. One of the
36:39
things that makes this audit experience
36:41
really magical for using in an agent
36:43
is that it's so coherent over
36:45
such a long period of time, like
36:48
over tens of tool calls and you
36:52
know, I suspect as the tasks get
36:54
harder and harder, you would need to
36:56
be learned over hundreds, if not thousands
36:58
of tool calls and working on it.
37:02
One of the things that I think transcendentally, like
37:04
again, like back to the mission of the
37:06
company, the mission of the company is sort of
37:08
what is the... We want to automate as
37:10
much of coding as possible and while still having
37:12
the developer in the front seat. And
37:15
automating coding in the short
37:17
term involves, you know,
37:21
allow developers to just like, in
37:23
the cases where they want to sit back and let
37:25
the model code and doing that, but in the cases where
37:27
they want to drive the editor to like, make
37:29
code, like, I don't know, you're doing voice
37:31
and bias things and you want to like
37:33
switch your GRPC thing to some other TLS
37:35
package in Rust, like you should just be
37:37
able to tell the model like, I want
37:39
to switch my GRPC thing to, to use,
37:42
you know, Rust TLS instead of something else.
37:44
And the model should just get it and
37:46
be able to make these large scale code
37:48
-based flight changes. And
37:50
that requires the model to have
37:52
some agent type things, because you're never
37:54
going to sit down and write
37:56
out exactly the spec of your clippies.
37:58
Then the thing that the agent
38:00
really helps with is you don't have
38:03
to sit down and explain like, yeah,
38:07
we are 1DB, we make
38:09
this. We
38:11
have a backend that is written in
38:13
Rust and Go. The Rust hooks up
38:15
to go in this way. For our library,
38:17
we use this and model should just go and figure
38:19
it out. My
38:21
own experience of playing with agents,
38:23
which is much diminished compared to
38:25
yours, is that when it breaks,
38:28
it's the challenge to debug. Have
38:30
you built any systems internally for just
38:32
even looking at, okay, what is agent
38:34
doing? Why did it get in a
38:36
weird loop here? What's happening? How
38:38
do you visualize that? Oh,
38:41
we're building our own infrared for now.
38:43
I suspect that there will be phenomenal
38:45
products in the future that will make
38:47
this much easier. For
38:50
now, the same thing
38:52
with building props. So we used
38:54
this internal library called Quiant. And
38:58
the way we built Quiant was it was
39:00
well suited to our own need to design. And
39:02
I think for the same user agent infrastructure,
39:04
we'll be building our own infrastructure in the short
39:06
term. And I suspect in the long term,
39:08
there'll be on
39:11
what phenomenal, you know, DevTools that
39:13
will come up to make it much
39:16
easier to both inspect the chains, be
39:18
able to stop at any point and
39:20
restart the chains, be able
39:22
to debug them in production when something weird
39:24
goes wrong, all sorts of things that you
39:26
would need to be able to run like
39:28
a production system at scale. Is
39:31
the agent evaluation like more also
39:33
like a, it sounds like it's more
39:35
of a vibes -based approach than like
39:37
specific metrics? Yeah, so it's pretty
39:39
clear why it's based. I suspect
39:41
it'll be wife's face in the short
39:43
term and become, as we get
39:45
better at shipping these, it'll become more
39:47
and more sort of deli metrics
39:49
and you'll be much more operational with
39:51
it. When
39:53
you look at like something like
39:55
a Devon or like these
39:57
sort of like completely automated, like
40:00
no program or like approaches,
40:02
do you view that as like
40:04
competitive or interesting or like,
40:06
what is your... I think we're
40:08
interesting and... the medium term,
40:10
if you can actually take your
40:12
hands off and let the
40:14
model drive your entire editor or
40:16
let the model drive the
40:18
entire editing process, I am
40:20
totally open to it. But
40:23
in the case where it's not
40:25
really useful and boring and not
40:27
really that fun, we
40:30
just wait. We
40:33
just what just wait we just wait
40:35
until it gets good enough like we keep
40:37
training the models and at some point
40:39
it will get good enough and then it
40:41
will be really fun to use I
40:43
think in general over a one to two
40:45
year time train I expect that the
40:47
way people will code will change and I
40:49
don't think people I think in the
40:52
short term that seems really scary, but I
40:54
think it'll be this gradual process and
40:56
it'll be extremely natural to everyone coming in
40:58
that the way the way coding is
41:00
changing you I think, for
41:02
example, the train from not having
41:04
a co -pilot to a co -pilot
41:06
was extremely natural in retrospect. It
41:09
was not something that was scary
41:11
to anyone. It was this thing
41:13
that predicted the other thought, and
41:15
you were like, wow, this is
41:18
phenomenal. And you just started losing
41:20
it. And then the train
41:22
from going from this co -pilot to
41:24
this. you
41:26
know foreground agent interface where the model
41:28
that does edits across multiple different files and
41:31
you're like oh I want to switch
41:33
this to use Rust TLS and I want
41:35
to you know make sure that you
41:37
always use HTTP2 and blah blah blah like
41:39
the model gets it and it reads
41:41
all the files and it makes the changes
41:43
and you can immediately review the changes
41:45
very quickly and tell that are correct and
41:47
that was also pretty natural I don't
41:49
think there was any way in the middle
41:51
where like people felt like disoriented and
41:53
I think that's sort of going
41:55
to background things, it'll be, all
41:59
these things are always, you know, more, more
42:01
gradual than one would expect. You would have
42:03
expected in 2020 that like if I said,
42:05
the way you'll be coding is you sort
42:07
of start talking to the computer and it'll
42:09
make changes to random files and you'd be
42:12
like kind of freaked out. You'd think, oh,
42:14
it's going to add all these bugs. It's
42:16
going to be impossible to review. Like I
42:18
really enjoy coding. Why the fuck am I
42:20
doing this? Yeah. That like all,
42:22
all these things would have seemed scary. and
42:24
yet then a five years in, four
42:26
years into the language model, you
42:30
know, language model journey of
42:32
products, like things feel quite
42:34
natural. So like 2021 is
42:36
called Paula 2025, we're
42:38
in now. And at any
42:40
point in time, you know, making the
42:42
change has not felt very disoriented, which
42:44
like maybe in one step it would
42:47
have, but right now it's not really that
42:49
disorienting. Well, it feels like a
42:51
lot of fun to me. I mean, like it's
42:53
like, I guess like when I like connect the dots
42:55
from like 2020 to now. It's
42:57
gone better. It's like, it's, it's,
42:59
it's gone better, right? It's like, yeah,
43:02
I guess we're like, I'm going,
43:04
it's like, you know, when I, when
43:06
I look a few years out,
43:08
it's, I have no idea, but it's
43:10
hard not to see that like
43:12
a world where you wouldn't really be
43:14
doing anything that looks like programming
43:16
a few years out, right? Or more
43:18
people will be coding, more people
43:20
will be making much more difficult things.
43:23
like things that are considered much
43:25
more difficult, be it lower
43:27
level things, be
43:29
it larger
43:31
projects, even for their
43:33
side projects. I think people
43:35
are usually very conservative with their side projects because
43:38
they're like, oh, you know, I probably won't have that
43:40
much time. I think people will
43:42
get much less conservative with these side projects.
43:44
I'm generally just extremely optimistic in the medium
43:46
term. Yeah, yeah. Do
43:49
you feel like at all
43:51
like... I
43:54
mean, I guess, first of all,
43:56
don't you think it's a totally different
43:58
world where everyone can do these
44:00
monster side projects easily? That seems
44:02
like software is a very different
44:04
feeling. Even doing
44:06
a software company seems like it
44:08
might be hard to have a
44:11
protected advantage as much, right, when
44:13
it's easy to build this stuff? I
44:16
can't philosophize over that.
44:18
I'm not really scared of
44:21
people having medium -sized I
44:23
can't think of these things
44:25
as like experimentation becomes much more,
44:28
much more natural. I think a lot
44:30
of the things that large changes
44:33
are usually scary at companies because a
44:35
large change requires changing so many
44:37
pieces and changes so much time that
44:39
you want to plan out everything
44:41
upfront. and then planning is really hard
44:43
because you can't really foresee how
44:45
your production system will look if you
44:47
do XYZ, then everything becomes much
44:50
more scary and then you add more
44:52
meetings and it becomes more formal
44:54
and then everything just becomes worse and
44:56
worse over time. I understand
44:58
it, right? If you're doing monkey year
45:00
database transition, boy, do you want to
45:02
plan out every single small detail and
45:04
then you want to argue over every
45:07
single small detail. But if
45:09
you can start prototyping these things
45:11
really quickly, Maybe
45:13
it becomes less talking, more coding.
45:16
You have much cleaner concrete
45:18
artifacts. If
45:20
you're in PyTorch and you want to do a
45:22
small API change in PyTorch, it'll take a journey.
45:24
You probably want to debate out the hell out
45:26
of it. If you're in
45:28
PyTorch and you can have
45:30
a prototype in three days, maybe
45:33
you should just argue with the
45:35
prototype now. Is that how
45:37
you do things at cursor? Hopefully
45:41
more and more. So yeah, I
45:43
mean, there are still things that are
45:45
scary, but Definitely, I think I
45:47
found myself thinking it's just much better
45:49
to argue. I suspect that that
45:51
change will continue. Awesome Well,
45:53
I guess one one final question
45:56
if something comes to mind
45:58
when you think if you were
46:00
sort of outside of cursor
46:02
and kind of like fresh eyes
46:04
into this, you know kind
46:06
of world of AI applications and
46:09
LLMs that kind of working for so many different things.
46:12
Is there something else that kind of excites you
46:14
that you wish you had time to think about? Personally,
46:18
for me, I've always
46:20
wanted sort of like
46:22
a really good reading
46:24
experience. I
46:26
like to spend my time
46:29
sort of free time either reading
46:31
or spending time even reading
46:33
code bases. I think it's
46:35
sort of this underrated aspect of
46:37
coding that like all of us
46:39
produced some of these artifacts that
46:42
we've poured our many years of
46:44
our life into. Reddit,
46:46
someone has poured their life into Reddit.
46:48
And I really want to go read
46:50
and understand Reddit. What were the hard
46:53
decisions? What were the easy decisions? And
46:56
I think both for reading
46:58
books, for reading papers, and
47:00
for reading code bases, we
47:02
haven't discovered the final optimal
47:04
AI tool. I think
47:07
hopefully cursor will contribute to at least
47:09
reading code bases, but maybe, you know,
47:11
someone makes it easier to read books
47:13
or to read papers, I'll be really
47:15
happy. Like reading papers
47:17
is still like quite an arduous process. I
47:19
mean, PDF viewers, I don't love the current
47:21
PDF. You are still like you click a
47:23
thing and it'll jump into the final thing. It
47:26
feels like a lot more
47:28
primitive than it should be.
47:32
And, you know, I've recently been reading papers
47:34
by just pasting them into one
47:36
of these sort of chat
47:38
apps and things are getting
47:40
bad. I think in
47:42
general, it feels like there's a
47:44
lot of low -hanging food in lots
47:46
of different areas of life. Okay,
47:50
I got to ask it. What are
47:52
your top recommended reading code bases?
47:58
Well, as I just mentioned, Redis. Redis is
48:00
quite good if you haven't read it. It's
48:03
relatively small. And
48:06
still, it's
48:08
still quite fun. Probably that's
48:10
the one that I'd most recommend
48:13
people because it's the thing
48:15
that is used by everyone and
48:17
it's just really, really well -written.
48:19
SQLite, for sure, also, if
48:21
you haven't read SQLite. Again,
48:25
very well -written. It's this coherent document
48:27
by a very small number of people. And
48:30
then I think most of the others recommend like
48:32
software that you use, you should. try to go
48:35
read the software that you use. I
48:37
mean, some things are harder, but like I did. If
48:39
you're a fan of ghosty, the terminal, maybe
48:41
you should go, go spend a weekend trying to
48:43
read ghosty, or like if you're a fan
48:45
of PyTorch, maybe you should go look into why
48:47
PyTorch does what it does. I
48:49
think there's a lot of choices
48:51
that you can sort of criticize on
48:53
the outside and people under appreciate
48:55
the tremendous amount of work that people
48:57
say on the PyTorch Steam have
48:59
put in to make PyTorch like really,
49:01
really easy for you to use.
49:03
And there's a magical experience where all
49:05
the sort of gradients flow naturally
49:07
that has taken many tens of thousands
49:09
of engineering hours. I don't know
49:11
if it's in the hundreds of thousands
49:13
of millions, but it's like a
49:15
lot of engineering hours. Interesting.
49:19
Well, thank you so much. I really appreciate your time. Thanks
49:23
so much for listening to this episode
49:25
of Gradient Descent. Please stay tuned for future
49:27
episodes.
Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More