Episode Transcript
Transcripts are displayed as originally observed. Some content, including advertisements may have changed.
Use Ctrl + F to search
0:04
I'm Jan 11. And I'm
0:06
Steve Strogett. And this is The Joy
0:08
of Why, a podcast from
0:11
Quantum Magazine exploring some of the
0:13
biggest unanswered questions in math and
0:15
science today. Steve,
0:20
hi. Hey, Jenna. How's it going? Good.
0:23
I wanted to tell you about
0:25
this conversation I had about AI
0:27
and large language models. OK.
0:29
Have you been thinking about AI a lot right now?
0:31
Is it on your mind? Sure. Can't resist.
0:33
fun playing with it and now my
0:36
interest is piqued. Well, it's interesting because
0:38
Quanta actually just published a whole series of articles
0:40
about AI to kind of fill in some of
0:42
the blanks that are out there in the conversation,
0:44
right? Because we're kind of going over the same
0:46
material a lot. Well, they replace our jobs and
0:48
what does it mean for creative fields. But there's
0:51
this almost neuroscience of
0:53
AI. How
0:55
do you understand what your AI
0:57
is doing? And that
0:59
really surprised me. You
1:02
would think, well, you built the thing. How come
1:04
you don't know what it's doing? But that's kind
1:06
of like saying I had a child. That
1:09
doesn't mean you have transparency into
1:11
their mind. Right. This feels
1:14
like a real frontier question because we
1:16
keep hearing AIs referred to as black
1:18
boxes. It's as hard as us opening
1:20
the black box of our minds. I
1:23
mean, it's not as though I can explain
1:25
to you the neuroscience of my mind as
1:27
I'm talking to you. Right. I
1:30
don't know how this black box is working. There's
1:32
an old essay by Lewis Thomas at one
1:35
point says something like, if
1:37
I had to do consciously
1:39
what my liver does. I
1:41
would just be vibrating, you know. Right.
1:43
A lot of what we consider consciousness,
1:47
I sometimes think is because we can't
1:49
process that much data. So we need
1:51
the consciousness as a very quick approximation
1:53
so we can do lots of tasks.
1:56
We have to be able to breathe automatically. We have to
1:58
be able to recognize a chair versus a person. Instantly
2:01
and loosely and these have all been
2:04
difficult things to teach an AI Oh,
2:06
huh because of its nature to want
2:08
to be exact I mean, I guess
2:11
the AI will have to learn the
2:13
fact that it makes mistakes to me
2:15
is almost reassuring Oh, that's interesting What
2:18
a cool thought because we so often
2:20
make fun of them for hallucinating and
2:22
that never occurred to me that that
2:25
might be a sign of being on
2:27
the road to real intelligence. I think
2:29
these advances in AI and language, specifically
2:31
its large language models, have been really
2:34
intriguing. So I had the
2:36
chance to talk with Ellie Pavlik. She's
2:38
a computer scientist and linguist at Brown
2:40
University, and she heads this language understanding
2:42
and representation lab, which is trying to
2:44
understand not just language and language models,
2:46
but how they actually work. We had
2:48
a chance to talk about all of
2:50
this. Fantastic. So let's hear from Ellie.
2:55
So, Ellie, welcome to the Joy of Why.
2:57
We're thrilled to have you today. Thank you,
2:59
yeah. This topic is really all over the
3:02
news right now, and it's in our lives,
3:04
actually, this issue of AI. Before
3:06
we get too deep into it, I'm curious
3:09
about your own trajectory. You started in economics
3:11
and you started playing saxophone. How
3:13
did you go from that to studying
3:15
computers and how they encode semantics? I
3:18
always wish I had a really like literary
3:21
answer where like it all comes full circle.
3:23
It's like only because I began where I
3:25
did, could I have ended up where I
3:27
am. Some profound life lesson. Exactly. It turns
3:30
out it wasn't like scripted and perfect. So
3:32
I think the path into CS was very
3:34
much through Econ because I had a research
3:36
gig with a microeconomics professor and the grant
3:39
work I was given was to like make
3:41
plots in MATLAB and that was overwhelming for
3:43
someone with no CS background. And I was
3:45
like, okay, well, maybe I need to learn
3:48
how to code. So I took an intro
3:50
class just so I didn't feel so out
3:52
of my element and there's this very pleasant
3:54
nature to like writing a little thing and
3:56
running it and it works and it does
3:58
what you said. And then I've always thought
4:00
I liked the idea of research. So I
4:02
started working with the one professor who was
4:04
doing language stuff but then really kept working
4:06
with him because he was working more and
4:09
more on semantics and that resonated like that.
4:11
like to have to do something I think
4:13
I was always interested in. Slightly the overachiever's
4:15
response, I have to make a plot, therefore
4:17
I must get a degree in computer science.
4:19
I wish it was that, but I think
4:21
it was like absolute confusion about what, like
4:24
I didn't know what skill I was missing.
4:26
What was required? It's just like, I don't
4:28
even understand what's going on. I don't even
4:30
know what question to ask. So I can
4:32
imagine years ago, if you had said to
4:34
somebody, oh, I work on how computers encode
4:36
semantics at a dinner party. You might have
4:39
ended the conversation, but these days has reaction
4:41
changed. Yeah. When you tell people you're working
4:43
on things like large language models. Absolutely. I've
4:45
said this is like a blessing and a
4:47
curse. So I used to say I do
4:49
natural language processing, which is getting computers to
4:51
understand languages like English or Chinese or Spanish
4:53
as opposed to computer languages like Python or
4:56
Java. And yeah, most people were zoned out.
4:58
But now it's like an open invitation to
5:00
talk about all of the kind of... I
5:02
had philosophical questions that's on everyone's mind. And
5:04
we're going to ask you all those too.
5:08
Before we get into the philosophical aspects, which
5:10
I do believe you integrate into your work,
5:12
give us a little synopses of what it
5:15
is that you do. You said natural language
5:17
processing. You said large language
5:19
models, LLMs. Yeah, so natural language
5:21
processing is like the broader field
5:23
that kind of gave rise to
5:25
LLMs that could encompass anything that
5:27
involves. getting computers to work
5:29
with human language. NLP isn't
5:32
really about the approach you're using. It's
5:34
about the kinds of problems you're trying
5:36
to solve. So before large language models,
5:38
maybe you would have something like a
5:40
sentiment classifier or a spam filter or
5:42
information retrieval like Google search or machine
5:44
translation, right? All of these tasks would
5:46
be NLP and they might use machine
5:48
learning or they might not. And if
5:50
they use machine learning, they might use
5:52
neural networks and deep learning or they
5:54
might not. And so then large language
5:57
models are like one type of model
5:59
that are neural networks predicting the next
6:01
word. And it's turned out that as
6:03
a consequence of building these things, they
6:05
can be used to solve lots of
6:07
different tasks. And so there's this feeling
6:09
that they're subsuming a lot of the
6:11
things that traditionally other models in NLP
6:13
were being created to solve. But definitely,
6:15
I would say NLP is a broad
6:17
field that cares about solving language problems
6:19
using computational tools. Excellent. And then what
6:21
exactly is it that you're looking into
6:24
around things like large language models and
6:26
chat GPT? Yeah. So right now, when
6:28
I talk about what my lab does,
6:30
we're basically working on large language models.
6:32
The kinds of questions we're really interested
6:34
in is the same questions we would
6:36
have asked about humans and still do
6:38
ask about humans, which is just like,
6:40
how do they represent language such that
6:42
they do the things they do? What
6:44
does it mean to represent language and
6:46
how does that representation of language support
6:49
the various kinds of interesting linguistic behavior
6:51
that we get and other behavior? Now
6:53
that you have language models that produce
6:55
often human -like behavior and then sometimes
6:57
a little bit alien weird behavior, but
6:59
obviously are so linguistic in a way
7:01
that non -human things have never been
7:03
before, it's just interesting to
7:05
ask how they do it and then ask
7:07
in what ways is that the same or
7:10
different from humans and is that a difference
7:12
that really matters for something we might care
7:14
about, like comprehension or meaning. Hmm,
7:17
so let's think about this. relationship
7:19
between how these large language models
7:21
are processing language versus how humans
7:24
are. I think that's
7:26
very intriguing. I understand why we
7:28
don't have immediate transparency in how
7:30
humans are processing language. make
7:33
humans, evolution made humans, and
7:35
we are these black boxes. We can
7:37
interrogate ourselves, we can self -reflect, we
7:40
can analyze other humans. Why is a
7:42
computer a black box if it's human
7:44
made? That is something I think people
7:46
struggle with. What do you mean you
7:49
don't know how it's doing what it's
7:51
doing? You made it. Yeah, it's somewhat
7:53
unique where we are right now that
7:55
it's a computational system that we're treating
7:58
as though it's an organic system, like
8:00
as though it was created by something that
8:02
wasn't us. It's a hard one to answer
8:04
because you really have to answer with some
8:06
kind of an analogy and it's like, what's
8:08
the right analogy? So the direct
8:10
answer is like, well, we understand the actual code we
8:12
wrote, you can go through line by line and say,
8:15
this is what this line of code is doing. But
8:17
what that code is doing is it's
8:19
calling a machine learning program, which means
8:22
it's setting up a set of principles
8:24
and rules, but then the model is
8:26
going to follow these to gradually fit
8:28
patterns of data, right? We understand the
8:30
basic constraints on how that learning happens,
8:32
but you can't then explain exactly the system
8:35
that comes out the other side. In particular,
8:37
you can't explain why the system that comes
8:39
out has the properties and the behaviors it
8:41
does. There's not a direct kind of reduction
8:44
of the behavior you see from an LLM
8:46
to the lines of code and the principles
8:48
that gave rise. there's
8:50
different analogies you can play with one I really
8:52
like is we have a recipe for how to
8:55
make large language models and you can understand the
8:57
recipe like you know what the steps are that
8:59
you're doing and you understand some levels like if
9:01
I don't put baking soda in the cake it
9:03
will turn out I actually don't know if it'll
9:06
happen. I'm not a very good baker. It'll turn
9:08
out too flat, too chewy, something. And
9:10
you can even do some kind of substitutes like,
9:13
oh, if I don't have eggs, I can use
9:15
smashed banana or whatever. And it'll have these different
9:17
consequences. But that doesn't mean you understand the chemistry.
9:19
You can't precisely say exactly why the cake is
9:21
this exact way that it turned out. And so
9:24
I think that it's an important distinction to me
9:26
from being able to build something or create something
9:28
and understanding how it works. as
9:30
we've moved towards machine learning deep learning that
9:33
just pulls those two things apart. So
9:35
the large language model, do I call
9:38
it a computer? It must be a
9:40
network of computers. How do I refer
9:42
to this entity? I don't want to
9:44
anthropomorphize. I actually think this is an
9:46
interesting issue, even in like how to
9:48
talk about them, because they're producing behaviors
9:50
that until recently only humans produced. We
9:52
just don't have the language for talking
9:54
about that thing. without using anthropomorphized language.
9:57
So you call them LLMs. I call
9:59
them large language models. And they sometimes
10:01
are on one computer. There's sometimes on
10:03
many computers. It's like a virtual entity.
10:05
It's not a physical entity. It's a
10:07
meta something. So here's this meta black
10:09
box. That's still a mystery.
10:11
Why can't we ask it? Hey, what are
10:14
you doing? How'd you do that? Yeah. So
10:16
we have a complicated mathematical model, the whole
10:18
goal of which is to say, given a
10:20
sequence of words, predict the next word to
10:23
if I just say. I
10:25
just saw a school bus
10:27
drive past my house, car,
10:29
yard, whatever, like you can predict what the next word
10:32
might be. And that's primarily what they're optimized
10:34
to do. That's what they're designed to do. And then
10:36
they're doing all kinds of crazy math to support that.
10:38
But then if you say something like, why
10:40
did you just say what you
10:42
said, the objective is not to
10:44
faithfully explain why it just said
10:46
what it said, if it even
10:48
knows what you refers to here,
10:50
which it doesn't. but instead
10:53
to say what kinds of words are
10:55
likely to come next after that question,
10:57
right? And it's going to be sourcing
10:59
its understanding of what's likely to come
11:01
next from having seen lots and lots
11:04
of data of questions similar to that,
11:06
followed by answers. And so
11:08
that in and of itself is completely
11:10
untethered to any reference to the language
11:12
model's internal state, for example. The way
11:14
the systems are designed and trained, right,
11:17
there's absolutely nothing that constrains its answer
11:19
to this question to be useful or
11:21
correct or accurate. There's nothing that guarantees
11:23
that its explanation of its behavior not
11:25
only is not right, but has anything
11:27
to do with its behavior. And we
11:29
have some studies that look at these
11:32
explanations where we're trying to see how
11:34
much What it explains its behavior actually
11:36
aligns with what it does and I've
11:38
just been surprised by the degree to
11:40
which they are In consistent with each
11:42
other and we're trying to figure out
11:44
why that is because there's nothing that
11:47
would objectively require it It's the same
11:49
kind of argument of like why can
11:51
I just ask you like? How
11:53
your nervous system works how your brain works like
11:56
you're using you're using it to not know like
11:58
it's your brain That's telling me you don't know
12:00
how your brain works, right? And you're like what
12:02
do you mean? Of course the mechanism by which
12:05
the language model doesn't know how it works is
12:07
very different than mechanism by which humans Don't know
12:09
how they work, but it's still this kind of
12:11
point that those two things don't really operate that
12:13
way Yeah, it does make me wonder if trying
12:16
to correct the neuroscience of how a human mind
12:18
works will be equally challenging problems in parallel Are
12:20
you working on neuroscience aspects and how to think
12:22
about this? Yeah, that's the direction I've been super
12:24
excited about. Every time you work with a new
12:27
discipline, it just brings in a whole new set
12:29
of types of ways of thinking about things, terminology,
12:32
insights. So it brings new stuff. There are
12:34
ways in which I think neuroscience is... going
12:36
to be very informative here on certain aspects.
12:38
We often talk in AI and in cognitive
12:40
science about levels of analysis, which is just
12:42
saying there's many different ways to understand the
12:44
system. But it's like this idea that like,
12:46
what level should we be trying to understand
12:48
that before trying to analogize them to humans?
12:50
Is it more like the brain? Is it
12:52
more like the mind? Is it
12:55
more like society? Is it like a chaotic
12:57
system that's more like multiple people and we're
12:59
looking at emergent behavior because it is trained
13:01
on the whole internet? And
13:03
there's nothing that's like the one true
13:05
analogy. And so neuroscience brings this really
13:07
low level way of thinking about how
13:10
a lot of small numerical operations allow
13:12
certain more complex behaviors to emerge, and
13:14
cognitive science can provide other kinds of
13:17
insights. But we do
13:19
know some things that they're doing, which
13:21
for instance, they're looking at these semantic
13:23
relationships, as you described. They're guessing what
13:25
word comes next, and they're doing this
13:27
mathematically. How is that process achieved
13:29
for them? There's different
13:31
types of math that are relevant here.
13:33
The go -to is like the probabilistic
13:36
model, estimating one of the probability of
13:38
the next words. And so
13:40
you're just saying, I've seen a set of words
13:42
so far, and I need to encode this into
13:44
some state. And then you're saying, what is the
13:46
probability of a next word given this state? But
13:48
then something that becomes quite complex and one of
13:50
the reasons they are harder to explore is that
13:52
the way of representing that state, it's not like
13:54
the coin flipping example where you say it's either
13:56
heads or it's tails, right? Because there's an infinite
13:58
number of these things. And so
14:00
the way that gets encoded is more
14:03
of a linear algebraic notion or even
14:05
more calculus. It's like this high dimensional
14:08
space where there's a ton of different states
14:10
here and it's really hard to know exactly
14:12
what the shape of this thing is and
14:14
how you move around it. And so this
14:16
is where a lot of the complexity comes
14:18
in. Like on the one hand, we can
14:20
fairly easily think about the probability of next
14:23
word given a state and we can think
14:25
about kind of there are similar states in
14:27
this space and similar states will give rise
14:29
to similar probabilities. There's stuff we
14:31
understand about that, but it's not. complete
14:33
enough level that we can, for example,
14:35
place guarantees or even predict the behavior
14:37
of a system without just running it.
14:40
I know that you've been really careful
14:42
not to invest too much emotion in
14:44
this idea that they're thinking. But
14:46
how can we tell what
14:49
they're understanding or if they
14:51
know the information that's being
14:53
provided? Yeah. I
14:55
wouldn't say I don't invest emotion in this.
14:57
I feel like I've spent a lot of
15:00
time thinking about this and worrying about it
15:02
and caring about it. But I'd have it
15:04
picked aside because like the thing that I'm
15:06
most excited about in terms of what we
15:08
can get from language models is being forced
15:10
to be precise about what we mean by
15:12
these things. So the thing I'm quite sure
15:14
like, no, they're not human. Like in these
15:17
intangibles that we're thinking about when we ask
15:19
these questions about like meaning and understanding and
15:21
stuff, I don't think they have it. I
15:24
think the thing that's so hard is how intangible
15:26
that thing is. The truth is we don't know
15:28
what those words mean. We don't really know what
15:30
we mean when we say those things. Like
15:33
understanding, meaning, thinking, knowing, like
15:35
any of these very anthropomorphized,
15:37
very loaded words, we
15:39
kind of know how little we understand what
15:41
those things mean because when we talk we
15:44
have to say stuff like, Yeah, they know
15:46
but they don't really know and bank on
15:48
the fact that the person we're talking with
15:50
kind of gets it like these are very
15:52
intuitive concepts and what elements are forcing us
15:55
to do is make them precise and scientific.
15:57
And I think my feeling is as we
15:59
try to do that these words will very
16:01
much fall apart into many smaller concepts that
16:03
can be made precise. So the thing that
16:06
we refer to as knowing or understanding is
16:08
not one thing that you have or you
16:10
don't have. It's like a. shorthand
16:13
for a collection of things one of
16:15
which might just be being human right
16:17
like it might be that part of
16:19
what we mean when we say really
16:21
know or really understand is being a
16:24
human and. having all these other properties,
16:26
like making a correct prediction given a
16:28
certain thing and making these inferences and
16:30
behaving consistently across so many states or
16:32
whatever. But I think that none of
16:34
these words are actually, they're just not
16:37
scientific words. And we are like feeling
16:39
obligated as scientists to confront them. So
16:41
the thing I stubbornly push back on
16:43
is saying, whether or not they're
16:45
thinking, because on some aspects of what it
16:47
means to be thinking they are, right? And
16:49
it's actually more productive to say, what are
16:51
we actually going for? What does it mean?
16:53
And very importantly, why does it matter? If
16:56
we're asking it for some technical, practical reason,
16:58
they might be good enough for many cases.
17:00
If we're asking it for some much deeper,
17:02
much more existential reason, then they're probably not.
17:04
But like actually teasing those apart is really
17:06
important. It's interesting to me that you're not
17:08
dismissing it outright. You're not saying, no, it's
17:10
just MATLAB. you know, which is a kind
17:12
of computer code that you can write. But
17:14
you're not doing that right now, which is
17:16
very intriguing. I'm not. And
17:18
definitely not everyone in my field, but
17:20
a lot of people in my field
17:22
really don't reserve anything in the human
17:25
mind that's not computational, right? So saying
17:27
something like it's just math is like
17:29
a weird dismissal. It's not clear to
17:31
me that that same thing couldn't be
17:34
used to dismiss what we would call
17:36
natural intelligence, because almost by definition, somebody
17:38
who's working on trying to understand the
17:40
human mind. scientifically thinks that there's ultimately
17:43
some model there. So it's
17:45
like the dismissal on the grounds that
17:47
the thing isn't human and therefore not
17:49
thinking invalidates the whole field that we're
17:51
in. And like, what was the point?
17:53
You look back to when Turing began
17:55
to think about mechanizing thought, which led
17:57
him to algorithms and the idea of
17:59
a universal machine that is a computer
18:01
that used to be human beings were
18:03
called computers. He also reflected back and
18:05
said, well, you know, we're machines too.
18:08
Our thought is mechanized. I mean, we're
18:10
born out of laws of physics, and
18:12
do you feel that it's feeding back
18:14
into your understanding of human intelligence? You're
18:16
talking about it in a way where
18:18
you've already said things that are very
18:20
provocative along those lines, but is it
18:22
making you think, well, we're kind of
18:24
computational in the way the structure of
18:26
our minds work too. I
18:28
wouldn't say feeding back because... I think
18:30
I thought that originally hence my attraction
18:32
to the field. Again, I think there's
18:34
plenty of people who work in both
18:36
cognitive science and AI who think you
18:38
can make a ton of technological progress
18:41
and never need to go as far
18:43
as saying it's possible to build actual
18:45
intelligence. But many do. Many, whether they
18:47
admit it or not, are drawn for
18:49
a more romantic notion of what it
18:51
is possible to do in AI, which
18:53
is that you think humans ultimately are
18:55
computational things and that there's nothing... something
18:57
metaphysical to humans that couldn't be replicated
18:59
in a computer. There's actually a lot
19:02
of interesting debates on this about what
19:04
kinds of properties might be inherent to
19:06
a digital computer versus something else. There's
19:08
a lot of room for talking about
19:10
whether the digital computer itself is the
19:12
right medium for replicating human intelligence. I'm
19:14
open to the possibility that that's the
19:16
difference, but I don't have any particular
19:18
data to point to that convinces me
19:20
that's the case. do
19:24
you have a fundamental belief that things
19:26
are computational, right? Again, it's
19:28
based on nothing, right? This is a personality
19:30
trait. But if you do believe it ultimately
19:33
is, then I think you actually have a
19:35
pretty hard argument to make for why being
19:37
a computer. precludes you
19:39
from thinking, right? For why you can say
19:41
it's not thinking because it's just compiling or
19:43
something. I think that's actually a pretty hard
19:45
philosophical argument that I haven't heard made particularly
19:47
well. People are kind of holding out something
19:49
special, which is the human part of what
19:51
we mean when we say something like understanding.
19:55
I love it. Deep question
19:57
there. It's almost like the...
19:59
Soul -free will questions, right?
20:02
What is it that's intrinsic about us?
20:04
And is it the mind now? Now
20:06
it's the mind. Yeah, right. It used
20:08
to be that living things had some
20:11
vital essence that made them different from
20:13
non -living things. But when we came
20:15
to believe in atoms and that we're
20:17
all atoms in various states of organization,
20:19
it was hard to see where the
20:21
soul or the vital essence fits in
20:24
there. So now what? We've retreated to
20:26
saying, well, At that level, yes, we're
20:28
all atoms, but intelligence, that's something else.
20:30
Only we get to be intelligent. The
20:32
machines are just doing math. Yeah,
20:35
it sounds like you don't buy it.
20:37
I don't, but I was interested in
20:39
the comment that Ellie makes that maybe
20:41
there's a way out by talking about
20:44
digital versus, I don't know what, analog.
20:47
That somehow that's where we get
20:49
to keep the special. ownership
20:51
of intelligence because we're analog. The way
20:53
our neurons work is not exactly digital.
20:55
I mean, she doesn't seem to believe
20:57
that. But if I heard her right,
20:59
it makes it sound like some people
21:01
think that might be the escape hatch.
21:04
Yeah, I get the impression
21:06
she is quite open to
21:08
these digital machines thinking and
21:11
that we're starting to understand
21:13
how to even formulate the
21:15
question. Now, we're being
21:18
pressed. by these advances to formulate
21:20
the question better. What
21:22
does it mean to be computational? I
21:24
don't think we're doing something magical.
21:26
We're doing it gooey and maybe
21:29
sloppier magically, right? This idea that
21:31
consciousness is this magic. Cluj, for
21:33
the fact that we're not infinitely
21:35
computational, is really interesting to
21:37
me. But I do think the mind
21:39
is computational. And so why couldn't a
21:41
digital machine achieve something like a mind?
21:43
I just wonder if we'll be able
21:45
to recognize it if it will need
21:47
consciousness the way that you and I
21:49
do. That's another question, right? Will it
21:51
recognize it far before we do? Will
21:55
it know it's aware? Will it be having
21:57
conversations? And also even it, even that I'm
21:59
saying it, we're going to have to start
22:01
thinking differently. It's not even a single entity.
22:04
Right, there's multiple computers that can go
22:06
into a single large language model. By
22:09
being in the thick of it, I think
22:11
we're starting to get more precise and also
22:13
realizing, wow, we haven't ever really tackled this.
22:16
Beautiful. Well, there's a lot more
22:18
to contemplate, so think about it during the break
22:20
and we'll be right back. Welcome
22:44
back to the Joy of Why.
22:46
We've been speaking with computer scientist
22:48
Ellie Pavlik about AI, language, and
22:50
the human mind. Now,
22:53
when these large language
22:55
models are first trained
22:57
on these enormous datasets,
23:00
do they continue to learn and develop
23:02
in their relationship, let's say, with the
23:04
user? Or as new ideas
23:06
are fed into the internet? Or are
23:09
they kind of frozen until there's a
23:11
big new training initiative? Everything
23:13
comes down to definitions, right? It kind of
23:15
depends on what you mean by learn and
23:18
develop. There's what we call the weights, which
23:20
is basically it solved some really complicated set
23:22
of equations to be really good at predicting
23:24
next words. And those equations are stored somewhere
23:26
in a file, right? And if you want
23:28
to talk to this particular instance of chat,
23:30
GPT or this particular instance of cloud, you
23:32
basically load those equations from that file and
23:34
that's what you're talking to. And so those
23:36
are called the weights. And often what we
23:38
think of is updating the weights as being
23:41
this kind of initial learning. And
23:43
there's lots of different ways to update
23:45
those weights. There's update the weights themselves.
23:47
There's basically add a little side file
23:49
that tells you how to pretend you
23:52
updated those weights. So that can allow
23:54
you to spawn different models that feel
23:56
like different models. But you could argue
23:58
about whether they're like... clones
24:00
of the same model or their different models
24:02
and that's a conceptual question but also a
24:04
lot of the things that are being sold
24:06
as learning and adapting have to do with
24:09
storing a side knowledge base that could be
24:11
specific to you. You have a chat with
24:13
the model and say I'm planning my daughter's
24:15
birthday and I have a whole discussion about
24:17
budget and her name and her friend's names
24:19
and who I want to invite and where
24:21
I live and then I come back the
24:23
next day and it like remembers this stuff.
24:25
It's not like everyone who's using Claude
24:27
or chat she bt now has access to
24:30
my daughter's name in my address that didn't
24:32
get pushed into the main model but it
24:34
still feels like it learned or developed because
24:36
it has information now that didn't have yesterday
24:38
and it's retain that information so there's different
24:40
mechanisms for models to learn and adapt and.
24:43
Depending on the particular tool and endpoint
24:45
you're using it might be. any combination
24:47
of these different things? Yeah, I'm wondering
24:50
if my chat GPT is going to
24:52
behave differently after lots of interaction with
24:54
me than yours will with you, for
24:56
instance. And as though, you know, I
24:58
have my dog and my dog is
25:01
trained to behave a certain way and
25:03
react to me in a certain way.
25:05
It's sort of wondering if it keeps
25:07
learning and keeps feeding back in that
25:09
way. Yes, there's lots of ways to
25:12
customize a model to you and maybe
25:14
useful differentiating factors. like how easy it
25:16
is to reset the model so that
25:18
we have the same model. In some
25:20
of these versions, if there's like this
25:23
add -on file that contains some information
25:25
about you that this model is reading
25:27
from, maybe some small things that have
25:29
dapped weights, you could basically delete that
25:31
file and get straight back to the
25:34
exact same base model that I have.
25:36
There's another version in which like if I
25:39
take CatGBT yesterday and I train it on
25:41
today's news and it updates the weights, it
25:43
would actually be really hard for me to
25:45
like... back to yesterday's version. I don't know
25:48
which weights to go and reset I would
25:50
have to like go retrain the whole thing
25:52
exactly as it was up until I Retrained
25:55
it today in order to get back and
25:57
even then it might be hard and both
25:59
types of things are learning Both things have
26:01
made a change and allowed the model to
26:04
develop and adapt and stuff But like some
26:06
of them we can easily undo and others
26:08
you can't so they're qualitatively very different types
26:10
of learning that probably have different consequences, different
26:13
interpretation. It is fascinating in the human analogy
26:15
where I can teach a group of students
26:17
a subject, even a very mathematical subject, that
26:20
we consider concrete and objective. And
26:22
we don't really understand how they learn it,
26:24
why some understand it more deeply and can
26:26
take it further than what you taught them.
26:29
And it's just fascinating that this is happening
26:31
in parallel in a machine. Absolutely. I think
26:33
an area that I haven't really collaborated with
26:35
yet but would like to is the cognitive
26:37
science of education because there's so much interesting
26:40
about how do humans learn and how do
26:42
we teach them and what's going on there
26:44
and how do people misunderstand things. I think
26:46
there's a lot to be shared in what
26:48
we're thinking about the black box of a
26:51
LLM and the black box of a human
26:53
from education sciences. Fascinating. You
26:55
use large language models as well
26:57
as study them. What's your
26:59
relationship like? with these
27:01
large language models? I mostly use them when
27:04
I study them. I've tried to use them
27:06
for a few things. I would
27:08
be embarrassed to be on the record, but I've
27:10
already admitted I recently got tenure and as a
27:12
consequence became involved in administration. Oh, yes. No
27:15
good deed goes unpunished, yes. Exactly. And so as
27:17
soon as I got involved in administration instead of
27:19
research, I was like, oh, I start to see
27:21
the use for large language models. So I tried
27:24
to do it to do things like generate the
27:26
minutes of a faculty meeting, help
27:28
me. Sort through some data
27:30
I was trying to process and actually
27:32
they weren't good enough like for even
27:34
these very basic tasks But beyond that
27:37
I haven't actually used them for many
27:39
things in my day -to -day life
27:41
And I don't know if it's because
27:43
a few experiences weren't quite good enough
27:45
or because I'm like Jaded and cynical
27:47
about them despite everything. I just said
27:50
let's say There was never another update.
27:52
This is it. These are the models
27:54
that we're all gonna be using So
27:56
we trained them on all
27:58
of our examples, for instance, translating
28:00
English to French to Swahili and
28:02
back again. And now
28:04
it's chaining us. Where
28:07
does that put us in this chain? And
28:09
will we cease to expand? Language modernizes all
28:11
the time. We speak differently than we did
28:13
100 years ago. Are we going to kind
28:16
of freeze in time because we're in a
28:18
loop with something? Now all our students are
28:20
learning to write and speak from the chat
28:22
GPTs or the clods as opposed to the
28:24
other way around. the classic academic answer is
28:26
like nothing is that new. I actually remember
28:29
a talk I saw like early in grad
28:31
school about how basically Google had trained people
28:33
to use keyword searches. And this was an
28:35
example of humans adapting their language technology, early
28:37
information retrieval, but just to lead out all
28:39
of your words. If you said who was
28:42
Thomas Jefferson's wife, it would just say Thomas
28:44
Jefferson's wife, right? And just scramble it, alphabetize
28:46
it, right? Like that's what got you the
28:48
best results out of the system at the
28:50
time. Now they actually wanted the full language
28:52
back and they were really struggling to get
28:55
people to write full questions. And so there's
28:57
already this example of people talking to a
28:59
computer and adapting their language to get the
29:01
best results out of the computer. And so
29:03
I think you will see this, people are
29:05
getting good at prompting language models. and talking
29:08
to language models in this way. I
29:10
haven't yet seen it carry over into how people
29:12
talk to each other. But technology
29:15
definitely does influence how people talk to
29:17
each other. Like my Gen Z students
29:19
say punctuation when they're talking. They'll
29:21
say something like, do you think this
29:23
is a good idea? Question mark? Like
29:25
they'll say that. And I'm like, I
29:28
think this is like a spillover from like texting. It
29:30
almost makes me optimistic. Language has always
29:33
been very dynamic and very responsive to
29:35
the technology and the context. And
29:37
still, I think as long as we continue
29:39
talking to humans as humans, I think it's
29:42
really cool and like cute when you see
29:44
things like people saying the word question mark
29:46
and dot, dot, dot out loud. It's
29:48
like a sign of how plastic and
29:51
dynamic and interesting languages, I
29:53
would worry about the kind of
29:55
collapse of linguistic diversity and innovation
29:57
if people start talking to language
30:00
models almost exclusively. I
30:02
don't know. I guess I'm an optimist. I imagine
30:04
that people do like to talk to people even
30:06
speaking as an introvert who doesn't particularly love talking
30:09
to people. I think that people will continue to
30:11
have human interactions and
30:13
that will save language. I
30:15
appreciated when you pushed back
30:17
at this idea that when
30:19
computers are just doing math,
30:21
that was different than when computers
30:23
create poems or novels or artwork
30:26
or songs. What do
30:28
you think this means for human creativity?
30:30
This is, of course, a question that
30:32
people are semi -panicked about. Yeah.
30:35
So I've been teaching this class this
30:37
semester with a professor at Brown named
30:39
John Kaley, who's literary artist as poetry
30:42
and other language arts projects and has
30:44
always used technology in the course of
30:46
doing that. And I think it's exactly
30:49
this question about our human's mathematical objects.
30:52
Even if you agree or grant that
30:54
some neurons firing in your brain in
30:57
a particular way caused you to
30:59
write this poem, it
31:01
doesn't devalue the poem in a particular
31:03
way. I don't think you have to
31:05
assert divine intervention was involved in the
31:07
creation of the poem to believe that
31:09
the poem itself has aesthetic and artistic
31:12
value. I don't think
31:14
we have to reduce it to the thing that
31:16
created it in a human. And
31:18
even if I understood the brand activations,
31:20
it doesn't mean there's not value in
31:22
analyzing this poetry. And I think the
31:24
same argument can apply to language models.
31:27
There is a way of thinking about
31:29
what they create. on
31:31
its face without caring about what math
31:33
and whether it was math that caused
31:35
it. And there's probably room for criticism,
31:37
depending on what you're going for, depending
31:39
on why you care, depending on who
31:41
you're talking to in the context. There's
31:43
a sense in which you can say,
31:46
this came from a language model and
31:48
therefore it's not interesting. It's meaningless and
31:50
everything in between. But I don't think
31:52
like... being mathematical devalues our creativity in
31:54
any particular way. It reminds me of
31:56
the sort of infinite loops of the
31:58
free will and soul arguments that were
32:01
unresolvable and are still debated and might
32:03
be forever. But here we are and
32:05
we care if people intentionally do harmful
32:07
things or not or intentionally make beautiful
32:09
things. That's just how we are. That's
32:11
the human condition. Exactly. Again, everyone kind
32:13
of relates to these citrations differently. But
32:16
like if I'm thinking about the time
32:18
I was like, particularly connected to a
32:20
piece of literature, a piece of art.
32:22
I don't think I spent a ton
32:24
of time thinking about how causal the
32:26
person was in it. Really, sometimes you
32:28
care about the person's story, but I'm
32:30
rarely like hung up on whether this
32:33
was preordained by the universe. Like that's
32:35
not interfering with my ability to appreciate
32:37
it. You can be a physical determinist
32:39
and still appreciate art. Enjoy the tape
32:41
modern. So I
32:43
wonder if even though you were thinking
32:46
about these things and deep in this
32:48
subject, if the revelation of the functional
32:50
LLMs that came out practically as tools,
32:52
if you were surprised by them, and
32:54
also, do you feel in a position
32:56
to predict what the future is going
32:59
to be like? How rapid is this
33:01
change going to be? I
33:03
don't think I've been super surprised by the
33:05
technology, but I think I've been a little
33:07
surprised by the pace of the rollout. I
33:10
wouldn't even say surprised because I think it's
33:12
economically driven, not technologically driven, right? It's not
33:14
like the technology is moving faster than I
33:17
realized, or at least not now, maybe
33:19
my early surprise moments were back in
33:21
like 2018, 2019 with what I would
33:24
say were the precursors to the large
33:26
language models. There's one called Elmo, one
33:28
called Bert. There was a little cute
33:30
period where we had a Sesame Street
33:33
theme going. unfortunately died after a
33:35
stretch of a few models. It was like a
33:37
very exciting time where it felt like research was
33:39
turning a corner. And I think a lot of
33:41
people in academia would point back to that time
33:44
as being like, oh, we're at a pivoting moment
33:46
in NLP. And then there was like the chat
33:48
GBT moment, which is where it was like suddenly
33:50
pulling back the curtain and like now everyone's involved.
33:52
And so that was a really important time that
33:54
I think surprised me in that pace at which
33:56
then the world was paying attention and reaction and
33:59
then the deployment. It does surprise me how quickly
34:01
people are pushing things out and how willing people
34:03
are. I'm generally an optimist, but it does scare
34:05
me a little bit. I think we're going to
34:07
have a few like, oh crap, moments that could
34:09
have been avoided, right? What would you imagine would
34:11
be a moment like that? I
34:14
could imagine some kind of big
34:16
security things, some kind of
34:18
either intentional or unintentional glitch or
34:20
attack where a lot of systems
34:22
are implicated AI. It
34:25
seems like it's lots of different
34:27
technologies, but they're actually all the
34:29
same technology, which makes you think
34:31
they're deeply correlated errors or vulnerabilities.
34:34
There's like a small amount of open
34:36
source software that many things are based
34:38
on. And I mean, it could be
34:40
overblown because a lot of things are
34:42
based on the Linux kernel. And that's
34:44
quite safe. The Linux kernel being pre
34:46
-unix, which a lot of our apples
34:48
run on this kind of operating system.
34:50
Exactly. It's like kind of core operating
34:52
system code that is then repurposed and
34:54
reused. The Linux was free, right? And
34:56
it was open source and it was
34:58
part of that utopian idealistic movement. And
35:00
obviously could still have bugs in it
35:02
and things, but was like understood in
35:04
a level that is different from large
35:06
language models. I think there's also the
35:08
obvious one that people talk about, which
35:10
is just the proliferation of scams and
35:12
this lack of trust, because if you
35:14
don't know that language is coming from
35:17
a human anymore, you can just fundamentally
35:19
start doubting everything. I've already felt myself
35:21
do this every time I see a
35:23
news story or an image. If I
35:25
didn't see it on mainstream media, then
35:27
I just preface everything with, I haven't
35:29
fact checked it myself. I
35:31
think there are a lot of these things that
35:33
it surprised me how willing people are to try
35:35
things out so far. We go
35:37
right back to it, human beings, man. We
35:39
try to be suspicious and we just kind
35:41
of can't help ourselves. Yeah. Right,
35:44
exactly. There's a question I
35:46
always like to ask of our guests. What about
35:49
your work brings you joy? I'm
35:51
glad we turned that because I only
35:53
just talked about the pessimistic thing, but
35:55
I think I ultimately am extremely optimistic,
35:58
right? Like, I think the potential value
36:00
of the systems far outweighs the cost.
36:02
A lot of people come into AI
36:05
more as dreamers than anything else. It
36:07
is just very exciting. It's fascinating. There's
36:09
nothing more fascinating than the human mind
36:11
and brain. Of course, we're obsessed with
36:14
this thing. We're a narcissistic species. It's
36:16
like, we're so great. We're so incredible.
36:18
How do we work? Then the concept
36:20
that we would stumble upon something... computational
36:23
that replicates parts of that. Being able to study
36:25
these things and ask questions that seem like they
36:28
don't have answers, but then take them seriously as
36:30
though they do have answers, I feel like it
36:32
feels like a big privilege. Treating these philosophical questions
36:34
as rigorous scientific, concrete questions that you can actually
36:36
make progress on. Yeah, like a lot of people
36:38
get a few late nights in college to like
36:41
think about these things and then you go and
36:43
have a real job where you don't get to
36:45
think about it again. Yeah, that's my whole real
36:47
job and that's wonderful. Ellie, thanks so much for
36:49
joining us. It's been a real pleasure. It's
36:52
a pleasure. What
36:54
a charming take on this that she
36:57
gets to think about what she wanted
36:59
to think about as a college student.
37:01
I think a lot of scientists feel
37:03
this way, that it's a privilege to
37:05
be able to really spend our time
37:07
doing what we want to do. Our
37:09
hobby is our job. Yeah, and hers
37:11
seems to me particularly elusive in the
37:13
science space. It's getting
37:15
so philosophical, right, that
37:17
how do you make progress in the same way
37:20
that you do in science? I
37:22
mean, philosophy can really spin your wheels for
37:24
a very long time. Yeah,
37:27
that makes me wonder, does philosophy
37:29
always turn into science? Just a
37:31
matter of time? There used
37:33
to be a question, how is life
37:35
different from non -life? But after Watson
37:37
and Crick, it started to really look
37:39
like it's going to boil down to
37:41
molecules and atoms. And Bertrand Russell, of
37:43
course, famous British philosopher, also turned to
37:45
science in many ways. I mean, he
37:47
was trying to write a kind of
37:50
mathematical prancipia, right? Logic, science. were
37:52
involved with things. So we're setting up what
37:54
Turing did, what Cantor did, what
37:57
Goodall did. I don't know, it's an interesting question.
38:00
You can send all your mail to Steve. Seriously,
38:04
let's just ask what are going to
38:06
be the longest holdouts? For instance, most
38:08
people would say values are not something
38:10
that can be quantified. But I'm not
38:12
even sure about that because with
38:15
morality being studied nowadays through
38:17
evolution of cooperation from a
38:19
biological perspective. I'm not even
38:21
sure that values are outside
38:23
of science. I
38:25
guess I'm espousing what the critics call
38:28
scientism, that it's all just science
38:30
at the bottom, and that's a big naughty thing to
38:32
do, isn't it? Okay, just
38:34
thinking out loud here. I
38:37
feel like you're lost in thought and
38:39
I need to give you some space
38:41
to ponder and process. Always
38:43
great talking to you. Can't wait to see you again.
38:46
This is fun. See you next time. Still
38:50
have questions about AI's impact? Wondering
38:53
how researchers devise experiments or
38:55
how mathematicians think about proofs?
38:57
Head to quantummagazine .org/AI for
38:59
a special series that looks
39:02
beyond prosaic AI -based research
39:04
tools to explore how AI
39:06
is changing, what it means
39:08
to do science, and what
39:10
it means to be a
39:13
scientist. Thanks
39:19
for listening. If you're enjoying the joy
39:21
of why and you're not already subscribed,
39:23
hit the subscribe or follow button where
39:25
you're listening. You can also
39:28
leave a review for the
39:30
show. It helps people find
39:32
this podcast. Find articles, newsletters,
39:34
videos, and more at quantamagazine
39:36
.org. The Joy of Y
39:38
is a podcast from Quanta
39:40
Magazine, an editorially independent publication
39:42
supported by the Simons Foundation.
39:45
Funding decisions by the Simons Foundation
39:47
have no influence on the selection
39:49
of topics, guests, or other
39:51
editorial decisions in this podcast or
39:53
in Quanta Magazine. The
39:56
Joy of Y is produced by PRX
39:58
Productions. The production team
40:00
is Caitlin Folds, Livia Brock, Genevieve
40:02
Sponsler and Merritt Jacob. The
40:05
executive producer of PRX
40:07
Productions is Jocelyn Gonzalez. Edwin
40:10
Ochoa is our project manager. From
40:13
Quanta Magazine, Simon France
40:15
and Samir Patel provide editorial
40:17
guidance with support from
40:19
Matt Carlstrom, Samuel Velasco, Simone Barr,
40:22
and Michael Cagnogolo. Samir
40:24
Patel is Quanta's editor -in -chief.
40:26
Our theme music is from
40:28
APM Music. The episode is
40:31
by Peter Greenwood and
40:33
our logo is by Jackie
40:35
King and Christina Armitage. Special
40:37
thanks to the Columbia Journalism
40:39
and the Cornell Broadcast Studios. I'm
40:42
your host, Jan Eleven. If
40:44
you have any questions or
40:47
comments for us, please email
40:49
us at quanta simonsfoundation .org. Thanks
40:51
for listening. .R
41:11
.X. videos.
Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More