#189 - Chat.com, FrontierMath, Relaxed Transformers, Trump & AI

#189 - Chat.com, FrontierMath, Relaxed Transformers, Trump & AI

Released Sunday, 17th November 2024
Good episode? Give it some love!
#189 - Chat.com, FrontierMath, Relaxed Transformers, Trump & AI

#189 - Chat.com, FrontierMath, Relaxed Transformers, Trump & AI

#189 - Chat.com, FrontierMath, Relaxed Transformers, Trump & AI

#189 - Chat.com, FrontierMath, Relaxed Transformers, Trump & AI

Sunday, 17th November 2024
Good episode? Give it some love!
Rate Episode

Episode Transcript

Transcripts are displayed as originally observed. Some content, including advertisements may have changed.

Use Ctrl + F to search

0:21

Andrey: Hello and welcome to the Last Week in AI podcast where you can hear us chat about what's going on with AI.

0:27

As usual in this episode we will summarize and discuss some of last week's most interesting AI news and as always you can also go to lastweekin.

0:35

ai, our text newsletter for even more AI news we won't be covering.

0:41

I am one of your hosts, Andrej Korenkov.

0:43

My background is that I finished a PhD focusing in AI at Stanford and I now work at a

0:47

Jeremie: generative AI startup. And I'm your other host, Jeremy Harris, um, co founder and, uh, CEO of CloudStone AI, the AI Safety National Security Company.

0:55

I guess that's, I mean, we've said that many times, but, but now, now you really, really know.

0:59

I don't know, yeah, Andrey: how many episodes have we done now?

1:01

It must be Approaching a hundred.

1:05

It's almost two years now. Jeremie: Yeah.

1:08

Right. You're right. We've missed a couple, but I mean, it's gotta be knocking on the door of a hundred.

1:12

I remember when we started, it was like in the wake of chat GPT, or that's when I came on, we'd each been doing separate podcasts in the meantime,

1:20

but, uh, yeah, it just like, all of a sudden everything went crazy.

1:23

Andrey: At least this week won't be too crazy.

1:25

I'll do a quick preview of what we'll be covering.

1:27

So, no huge stories this week.

1:30

We got some neat new features being introduced by OpenAI Netfropic and the business front.

1:36

We got some stories. of fun things opening eyes up to a few fun open source projects and models this week.

1:45

So I think that'll be interesting. Some research on interoperability and, uh, efficiency for small models and then policy and safety will be maybe the most meaty section we'll be covering.

1:58

Let's just say we will be covering the implications of Donald Trump's victory for AI.

2:03

And as always talking a little bit about what's going on with China and hardware and the US restrictions.

2:10

Before we get into the news, as always, do you want to acknowledge some listener comments?

2:16

We had a few on YouTube.

2:18

I always like seeing those. One person did say they like the idea of a community or discord.

2:24

So that's interesting. I, I'm, not going to make the call yet, but if we hear a few more, you know, maybe we'll, we'll make it and we can chat about AI news on there.

2:34

And, uh, Jeremy, we did have a comment saying that, uh, a person loved your take on meta and releasing of weights with regards to national security, which I think, I mean, it was.

2:47

Yeah, Jeremie: it was mildly spicy.

2:50

I, by the way, I want to, I want to add just a little, a little modifier to that.

2:54

Um, so the context was like, um, you know, some Chinese companies were shown to be, or sorry, the China, China, the China, China was being shown to use and

3:04

rely on met as open source models as a kind of, um, floor to their capabilities.

3:08

Very important. We've known about this for a long time. Obviously when I say we, I mean the world.

3:12

Um, and, uh, you know, And so I basically said, like, I think this is, we're getting to the point where it's indefensible.

3:19

Um, you know, I think one dimension, somebody, um, uh, just discussed this on Twitter with me.

3:25

It was a really good, good tweet.

3:27

And I think something we've talked about earlier on the podcast, but I wanted to resurface here.

3:30

They said, you know, Um, the, uh, the advantage of open source obviously is you could put back doors in these models, um, and thereby, you know, use them as a national security asset, have

3:42

China use Western open source models that have back doors in them that we can then undermine.

3:46

I think there are a variety of reasons why I don't think that's what's actually going on here.

3:50

I don't think Meta is actually doing this strategy, uh, for, for several reasons that, that we could discuss.

3:55

But, um, yeah. I think it would be interesting.

3:58

I think backdoors are going to be really hard to train out because unlearning is notoriously fickle and superficial.

4:03

So I just wanted to call that out. I think an important kind of additional level of detail to flesh that out with.

4:09

So there you go. You can append this to my rant in the last episode if you want.

4:15

A Andrey: little more nuance there, which is always good.

4:19

And, uh, also shout out to a couple more reviews.

4:22

Uh, one of them did say, keep up alignment comments.

4:26

And he even said that we are hitting the Goldilocks zone onto the existential risk talk, which I, I feel pretty proud of.

4:33

I think that's, that's the intent. A lot of work went into that.

4:35

Yeah. Uh, and we did have a critical review, which I appreciate, uh, calling out to Intro AI music, uh, seems that not everyone is a fan, terrible, truly terrible AI generated songs for the intro, which

4:50

I don't know, I, I like them, but maybe I'll keep them to like 15 seconds instead of 30 seconds.

4:57

And as always, I'll put them at the end for people who do enjoy them.

5:01

And one last thing before the news, once again, we do have some sponsors to give a shout out as with the last couple of weeks.

5:10

The first one is The Generator, which is Bobson College's interdisciplinary AI lab focused on entrepreneurial AI.

5:18

Bobson College is a number one school for entrepreneurship in the U.

5:22

S., and that has been the case for 30 years.

5:25

And just last fall, professors from all across, uh, Bobson, uh, partnered with students to launch this, uh, generator, which is a lab, uh, that is organized into eight

5:36

groups, such as AI, entrepreneurship, and business innovation, AI, ethics, and society.

5:41

And things like that. And it has now led peer training of faculty all across Bobson.

5:50

Their intent is just to accelerate entrepreneurship, innovation, and creativity with AI.

5:55

So yeah, it's a very cool initiative. We will have a link for you to check it out.

6:01

And one new one, actually, we do have a second sponsor and it is Darren McKee promoting his engaging AI safety book, Uncontrollable.

6:11

The full title of it is Uncontrollable, The Threat of Artificial Superintelligence and the Race to Save the World.

6:18

So if you do like the AI risk talk, I think you might be interested in this book.

6:23

Uh, Mark's Tegmark, who you would know if you care about AI safety, said that Uncontrollable is a captivating, balanced, and remarkably up to date book on the most important issue of our time.

6:35

It explores topics like uncertainty, control, and risk, and yeah, makes a case.

6:42

that we should be concerned about advanced AI, but it's not a Doomer book.

6:47

It lays out a reasonable case for AI safety and what we can do about it.

6:52

We'll have a link to it on Amazon in the show notes, and it's also on Audible.

6:58

You can just search for it. The title is Jeremie: uncontrollable.

7:02

Yeah, I've actually, uh, have had quite a few conversations with Darren, uh, on this topic too.

7:07

So he's, uh, you know, he thinks a lot about it. He's, he's talked to a lot of people as part of his research for this book.

7:12

So, um, certainly if you're, if you're interested in that, that space, I definitely wanted to pick up and read again, you know, Max Tegmark, one out of

7:19

one Max Tegmark's agree, uh, that this book is, uh, is a book and a great book.

7:25

And maybe the best book probably. Andrey: That's a little preview maybe of what's coming.

7:31

All righty, and now on to the news.

7:33

We are starting as always with tools and apps.

7:36

And the first story is about OpenAI introducing a predicted outputs feature.

7:42

This feature can speed up GPT 4.

7:44

0 by up to four times for tasks like editing documents or refactoring code.

7:51

So the gist is, uh, many times when you're using LLM, you may only want to tweak your input.

7:57

So you may give it some text or some code and say, you know, uh, correct any grammar mistakes in this document, for instance.

8:06

And so that means that you're mostly going to be spitting out what you take in with just a few tweaks.

8:13

And that is the gist of what this is.

8:15

If you use this, then you can have much faster outputs.

8:21

For me, it's actually a little surprising.

8:23

It's taken this long for this feature to come out.

8:25

It, I think, is pretty well established as something you can do.

8:30

But nice to see both Entropic and OpenAI introducing more and more of these.

8:36

really developer friendly, you could say

8:38

Jeremie: features. Yeah, this is definitely part of that productization push right towards, uh, more and more kind of application specific tooling that opening eyes is focusing on.

8:48

Um, you know, one of the things that is making this possible is speculative decoding.

8:52

This is the technique that, um, it it's been around for a little bit now, but now we're seeing it productized.

8:58

Uh, the basic idea behind it is you get Two different models.

9:01

You have a draft model, basically this like very small, cheap model.

9:04

And at any given time, um, you can get that draft model to propose like, what are the next five tokens or something like that?

9:12

Right. So get it to cheaply produce predictions for those tokens.

9:15

And then what you can do is uh, feed all five of those tokens in parallel to a larger model that would have more expensive computation, but it can, it can handle them in parallel all

9:27

in one forward pass spending the same amount of computers it would if it was just like one, one, uh, input that it was trying to process, and so, and then you essentially get out.

9:36

Um, predictions for how accurate the draft models, um, token proposals were.

9:41

And so this allows you to essentially amortize the cost of that more expensive model over a large number of tokens, get it to do sort of

9:48

editing and cleanup, so to speak, um, a lot faster and a lot cheaper.

9:52

So this is, uh, a Practical implementation of speculative decoding, uh, it's, it's one of those things where, you know, it's funny, you read the paper and then a couple of months later, it's

10:00

like, boom, you know, people are putting it into production, actually saving a lot of money.

10:03

So, um, this is, this is the whole idea.

10:05

Another advantage of course, is you don't have the problem that The model might, you know, hallucinate the stuff that's solid.

10:12

Like if you, if you have, you know, some small part of like a JSON file or something that you want to tweak and you want the rest of the file to be

10:18

anchored to be exactly the same, then this allows you to do that, right?

10:21

It allows you to, to fix. So what they're doing during speculative decoding is they're actually here fixing the part of the output, output that should be fixed and only having the large and expensive

10:31

model make those predictions, presumably on the, the variable parts of that output.

10:35

So This is, um, a bit of a janky reimagining of what speculative decoding, decoding looks like with this added constraint that, you know, the stuff before and after this window that

10:46

you're actually going to try to sample in, um, is, is, uh, is kind of concrete is locked in.

10:51

So, um. I think kind of cool. Um, I'm curious about the economics, what they are doing, by the way, is they're only charging you for the tokens that are actually getting kind of

11:00

modded in the middle, let's say wherever you want the modifications to occur.

11:04

So that seems fair, right? You're giving a strong prior on like, keep the beginning and the end, say the same.

11:09

So don't charge me for generating those tokens only charge me for generating the ones that I care about, which again makes a lot of economic sense.

11:16

Andrey: That's right. And, uh, there was also, uh, I guess a.

11:19

Partnership of factory I with open AI to test this new feature in their API and they have a few metrics.

11:28

It's not like, you know, there's no benchmark that they report here, but they do have some numbers of they did find in practice.

11:36

Two to four times faster response times while maintaining accuracy.

11:41

And they have examples of large files that would take 70 seconds throughout the staking, uh, 20 seconds roughly.

11:49

So yeah, very easy to see how this is useful in practice for various applications.

11:55

Next up, we are moving to Entropiq and a price increase for HYCU 3.

12:02

5. It costs four times more than the predecessor.

12:07

Uh, the claim I think is that the price hike is at least partially because HYCU 3.

12:13

5 is superior to the previous version, but, uh, rather surprising.

12:19

So the 1 per million input tokens and 5 per million.

12:25

output tokens. And that's again, four times more than the previous one.

12:30

Jeremie: Yeah. It's also almost 10 times more expensive than GPT 4.

12:33

0 mini, right?

12:35

So when, when we look at that, like that's pretty remarkable, right?

12:38

It's it's um, in fact, it's only two and a half times cheaper than the full GPT 4.

12:43

0. So GPT 4. 0 mini was supposed to be the haiku say of the opening eye series of the GPT 4.

12:49

0 series. Right. And so. Here we have essentially a model haiku that's coming out and saying, Hey, I'm still the small one, except I'm now going to cost you, you know, something closer to the full model size.

13:00

That's a really interesting play. And I think this speaks to something very interesting that's happening with the economics of these models, right?

13:06

Like one of the big questions has been, we've talked about a lot here, but, um, To what extent do LLMs just get commoditized right to the point where the margins go to zero?

13:15

Like your model is basically the same as their model is basically the same as your other competitor's model.

13:20

And so everybody has to just basically price based on the raw cost pretty much of producing the model and serving it.

13:28

And at that point, your, your profits go to zero, or, you know, this is kind of what happens economically.

13:33

And one of the challenges is You can't do that and build enough bank to spend for the next generation of massive multi billion dollar data

13:41

centers if you're just living a hand to mouth existence like this.

13:44

So a bit of a structural problem. This is the first time we've seen that trend bucked where we've seen a model come out and say, Hey, you know what?

13:50

On the basis of my higher quality, I'm going to up the, the, the, the cost associated with using this model.

13:55

You know, a lot of developers, um, some might say fairly understandably are coming back and saying, Hey, you know, this is an unwelcome development.

14:02

Um, not necessarily because of the price increase per But because the framing, um, is, is that, Hey, the, this is better quality.

14:11

So therefore we're charging you more. Um, this is really interesting, right?

14:14

There's this classic thing when you do startups, when you do, I get me, it's more broad than that.

14:18

It's economics really. Uh, when you want, when you're trying to sell something and make a profit, uh, value based pricing is the thing you go with, right?

14:26

You, you advertise how much value you can bring to the customer, how good your product rather than talking about.

14:32

Uh, the cost, the, when you talk about how much it costs you to make a thing, that's where you, that's a hint that your whole industry has been commoditized, right?

14:40

So when you go out to McDonald's and you say like, Hey, well, can you give me the same burger just a buck cheaper?

14:45

They'll tell you like, no, the patty costs this much, the bun costs this much, the cashier's time costs this much.

14:50

So therefore I have to sell you this bread. They probably won't do that.

14:52

They'll probably tell you to leave, but whatever. Um, so. Um, they'll literally tell you, sir, this is a Wendy's, uh, anyway, um, the, uh, but you kind of get it right when, when you're dealing with a commoditized industry where everybody

15:02

can basically offer you the same product, your margins go to zero, you argue based on cost.

15:06

This is different. Claude and anthropic is coming out and saying 3.

15:10

5 haiku is higher quality. Therefore we'll charge you more.

15:13

People pushing back on that is an indication that.

15:15

Well, actually this space is pretty commoditized, you know, like anyway, I think this is a really interesting tell.

15:21

Um, one of the big consequences, by the way, of all this stuff, as you see prices going up and down and side to side, and you've got new products coming online, um, it really makes a lot of sense.

15:31

If you're a company working in this space to have the.

15:34

Uh, the scaffold you need to very quickly assess through automated evaluations, whether the task you care about is being performed well by a given LLM.

15:44

So a new LLM comes online with a new price point.

15:46

You should be able to very quickly and efficiently assess does this LLM, At this price point, at this quality makes sense for my use case, right?

15:54

If you can't do that, then you can't ride efficiently this wave of lower and lower LLM prices.

16:00

You're not going to benefit from that in your product. So just to kind of, I guess, side thought there, you know, really important for, for companies to.

16:07

To get into the habit of checking these latest models, because there are companies for whom Haiku 3.

16:11

5 is going to be way, way better than the other options.

16:15

Um, but the question is, what are you competing against?

16:17

Are you competing against GPD 4. 0 or are you competing against GPD 4.

16:20

0 mini? And you know, right now we're somewhere in between.

16:23

Andrey: This is, uh, yeah, I think to me a little surprising.

16:26

Uh, the announcement of 3.

16:29

5 haiku was at the same time as 3. 5 sonnet, which recovered, I think about two weeks ago now.

16:34

And it was just this past week that they announced a price change.

16:38

And that is what led to people responding.

16:40

I didn't, you know, four, four times.

16:43

Uh, raise a price is pretty dramatic, so it must be a mix of, it was underpriced to begin with, perhaps like significantly underpriced.

16:55

Uh, and I guess there's also perhaps a factor of them just emphasizing 3.

17:01

5 sonnet as the main one they want to compete with going forward.

17:05

I don't know. Yeah. It's a, it's a certainly an interesting move from a competitive perspective.

17:10

On the lighting round, we are starting with Flux 1.

17:15

1 Pro Ultra and Raw.

17:18

So, Flux 1. 1 Pro from Black Forest Labs, one of the leading AI image generator providers, has now been upgraded to support 4x higher image resolution, up to 4x higher resolution.

17:32

million pixel, I don't know what MP stands for, but really high resolution.

17:37

And it still has faster generation times of 10 seconds per sample.

17:43

And this is priced at just 6 cents per image.

17:48

And they do have this raw mode as well, which just leads to kind of realistic looking images.

17:55

more akin to photography. So, yeah, I guess not too surprising.

18:00

We keep getting better and better models, more and more realistic, but I think we're keeping up with black forest labs and, you know, they're moving pretty rapidly in the space.

18:10

Jeremie: Yeah. And they're the ones who have memory serves partnered up with, uh, with X, uh, you know, formerly known as Twitter, uh, to, uh, support the

18:17

Grok app and the image generation functionality that they're developing.

18:20

So, uh, you know, this is. This is them continuing to pump out their own independent product line, which I don't know, may be integrated as well with, with Grok at some point.

18:29

Um, yeah, looking at the images again, I mean, I, I find myself continually saying this, I'm not a, an image guy.

18:34

So I, like, I don't know the, um, you know, the, the, the kind of, uh, the, the, the aspects of image generation, let's say that are of greatest interest to people

18:44

who really dig into the space, but the images look like they're really high quality.

18:48

The raw mode especially does look really gritty and real.

18:50

Um, Because I'm a bit of a buffoon in this space, I kind of look at these and go, uh, cool.

18:56

I feel like I've seen a lot of other models that have the same quality.

19:00

Um, so I'm kind of not sure, you know, where the, where the mode is in this space, but, but still, um, does look impressive.

19:06

And, uh, flux has kind of come out of nowhere too, uh, with these new models.

19:11

And speaking Andrey: of X and Grok, we have a bit of a story on that.

19:15

X is testing a free version of a Grok chatbot in some regions.

19:21

So this was previously exclusive to premium and premium plus users of X.

19:27

And now there is a free tier where you can do 10 questions in two hours for two models and 20 for the Grok 2.

19:38

mini models, plus a few image analysis questions per day.

19:43

So, uh, you do have to sign up to X, of course, and you do need to have a linked phone number.

19:49

But, uh, certainly, you know, this is something that you have in chat GPT, I think also in Anthropic, the ability to use the chat bots for free.

19:58

So this is just being tested in New Zealand now, but it'll be interesting to see if a continue of expansion

20:06

Jeremie: to more users. Yeah, and obviously a big goal anytime you launch something for free like this is to collect user data, right?

20:12

Upvotes and downvotes for, say, RLHF or something else.

20:15

Um, and, uh, and also just to own more mindshare.

20:18

I think one of the things that OpenAI continues to enjoy a massive lead on is the fact that ChatGPT is a household name, whereas, you know, Claude is not.

20:26

Um, and Grok increasingly is becoming one, but that's only thanks to the distribution they get through X.

20:32

And so I think, um, you know, at this point you combine.

20:35

The, the X distribution factor with the, the X factor, if you will, uh, with, uh, with the fact that this is free, that could be really interesting, but the, the code is interesting too, right?

20:44

Like a, a query code of 10 questions within two hours, I don't know about you, but when I'm sitting down with like with Claude, which is, you know, for some

20:52

of the work that I do, I tend to spend quite a bit of time with Claude actually.

20:55

Um, There are long sessions and there's a lot of back and forth and there's a lot of like going back and editing questions and you know tweaking prompts so uh that

21:03

that quota might be challenging for some of the heavier use cases which makes sense.

21:07

Yeah this Andrey: feels like you know you want to give people a taste so to speak so that people might consider subscribing to X which probably Hard to say,

21:19

I'm not sure if Grok will convince people who aren't subscribers to do so,

21:23

Jeremie: but you know, maybe. No, you're right. I mean, there's value in bundling it on with, with X, right?

21:28

Like I was going to say, there are other free chat platforms that give, you know, that don't give you a limit, but the fact of the X integration, that distribution is so, so key.

21:35

And I think it's still probably being underrated.

21:38

So. We'll see. Andrey: Moving on to applications and business.

21:41

Speaking of chatbots, , we have a kind of fun story, not a very consequential story, but, uh, one that is neat.

21:49

Uh, OpenAI has acquired the domain chat.com.

21:54

Uh, and I, we don't know the exact details of how much it cost, but it appears to have cost a lot, like in the maybe, uh, 10 million ish.

22:05

Uh. We know that it was previously acquired by HubSpot co founder Dharmesh Shah for 15.

22:13

5 million. I think just roughly two years ago or so.

22:17

And it has now been revealed that he sold chat.

22:21

com to open the eye and, uh, Sam Altman on X tweeted or posted just chat.

22:27

com. That was the entire post, I guess, showing off.

22:31

So, uh, it's not yet, uh, I guess been promoted heavily.

22:37

There's no new brand. Uh, it's still called Chad GPT, but you know, I mean, 10 million for UL, that's pretty significant.

22:45

Jeremie: Yeah. I mean, if it were 10 million, uh, that would be a haircut on the initial acquisition cost of 15.

22:51

5 million, which, uh, you know, it's pretty significant, but, uh, from, from the context, it seems like something more interesting, maybe going on here.

22:59

It seems apparently as if, so Shah Dharmesh Shah, the, uh, The guy who acquired it, uh, may have been paid in open AI shares.

23:07

So if that's the case, that would be kind of interesting too.

23:10

Uh, he had this somewhat cryptic post on X.

23:13

Um, all of this is very cryptic. It's the most cryptic launch of a new domain I've ever seen.

23:18

Um, but if you do go to chat. com, you will see, of course, the, uh, Uh, right now the chat GPT four Oh, uh, uh, interface.

23:26

So there you go. Andrey: Right. Yeah.

23:28

To emphasize 10 million. We don't know if that even is of a ballpark.

23:32

That's just based on what was previously paid.

23:35

You would expect it to be, you know, Around that maybe next up a more serious story and it is that Saudis are planning 100 billion AI powerhouse to rival the UAE tech hub.

23:50

So this is Saudi Arabia, of course, and it's planning this artificial intelligence project to Yeah, pretty much develop a technological hub to rival that of the United Arab Emirates.

24:03

This will be used to invest in data centers, startups, and other infrastructure.

24:08

It's, uh, titled the initiative project is called project transcendence, which is pretty fun.

24:14

Not, not dystopic at all. Yep. Well, you know, pretty ambitious, you could say.

24:19

And of course this will also be used to recruit talent to the region, which I'm guessing is perhaps not quite as prevalent there as in the U S or elsewhere.

24:31

So yeah, we, we've covered in the past how the UAE has invested significantly.

24:36

There've been developments from the region like with Falcon models that are pretty notable at the time.

24:43

Don't know that we've had too much to cover.

24:45

In recent times from the UAE, but, uh, certainly it's true that, uh, these countries are trying to invest

24:54

Jeremie: and be a player in the space. Yeah.

24:56

I mean, I think the, the biggest kind of recent stuff with the UAE has been infrastructure kind of structural stuff with G42 and the questions around, you know, can they decouple from Huawei

25:04

technology and Chinese tech and, you know, the department of commerce getting involved there.

25:07

So really the question about where Where's the, the future of say AGI training run scale data centers, where is that going to be?

25:15

And, and this idea that, you know, the UAE has this massive energy advantage, which is a big part of the reason and capital, which is a big part of the reason why so many people are

25:23

interested in it as a, as a hotbed, as a, as a place to build out the, this infrastructure.

25:28

Um, this is Saudi Arabia basically saying, Hey, wait a minute, uh, we're a giant oil rich nation with.

25:34

Deep concerns over how much longer that oil is going to hold up and be viable.

25:39

And so they're looking for ways to diversify out of that industry.

25:43

And well, guess what? Oil comes with, you know, that awful lot of, uh, of energy and, and that's, that's great.

25:48

So it gives them a lot of the ingredients they need again, the money and the energy to potentially seed something like this.

25:53

They already have a sort of similar structures, let's say to project transcendence.

25:58

There's a A company is sort of, um, state backed entity called allot.

26:02

Uh, that's a fund that does sustainable manufacturing.

26:05

It's got a hundred billion dollars in backing. That's about the order of what's speculated could be associated, could be associated with project transcendence.

26:12

We don't know yet how much actually will be forked over.

26:15

Um, but there are discussions with potential partners, uh, which include, I think I saw market or sorry, Andreessen Horowitz.

26:21

Yeah, that's right. Um, yeah. So apparently A16Z is talking with, uh, this, um, the public investment fund, which is sort of the, the state, uh, kind of entity that would be overseeing all this.

26:33

Um, so that's, it's kind of interesting. I mean, a Western private, uh, private actor looking at that, Apparently the, the, uh, fund itself is maybe growing to as large as 40 billion in commitments again, aiming for that 50

26:45

to a hundred billion in total, which would be, which would be pretty, uh, pretty impressive.

26:49

But keep in mind that is like about what a year of Microsoft infrastructure spend.

26:55

Um, and the, the challenge here is that the buildout for this is, is slated for like, you know, 2030, there are a whole bunch of problems right now plaguing Saudi Arabia on this front as well.

27:04

They've seen an overheating economy that's now causing them to claw back some of their previous commitments, um, to, to do similar, uh, buildouts in other

27:11

tech sectors too, um, including semiconductors and, and like smart power.

27:15

You know, smart everything basically. Uh, so, you know, now there's a little bit of uncertainty about the future of some of those projects.

27:22

This one certainly has a lot of truth, uh, buzz around it.

27:26

So, you know, see, see where that ends up going. Um, and, and by the way, you could did a little digging.

27:31

We, what kind of history does Saudi Arabia have in the LLM space?

27:34

I was not tracking this, but there was a 7 billion parameter model.

27:38

It's the only one I've been able to find so far, but, um, for, you know, take it for what it's worth.

27:42

Uh, there's a tech company called Wattad. That apparently built this model called the Moolhem and it was a Saudi Arabian domain specific LLM that was trained exclusively on Saudi data sets.

27:52

So a bit of a scaling scaling issue there in terms of getting beyond that.

27:56

But um, uh, there you go. So they have a, you know, a small footprint in this space, obviously hoping to attract it.

28:01

talent, which is going to be a really, really important resource.

28:04

Um, and I think that that's going to be a challenge for, for both, um, Saudi and frankly, the, the UAE as well.

28:10

Um, at least on the model development side, the infrastructure side, I think might be a bit of an easier play.

28:15

Andrey: Yeah. So good call out there. This is.

28:18

saying a backing of as much as a hundred billion and this is a people familiar with a matter kind of article.

28:25

So yeah, not too many concrete details there.

28:28

After the lightning round, the first story is again on OpenAI, but this time it's about hardware.

28:34

And it's that Meta's former hardware lead for Project Orion is joining OpenAI.

28:40

So this is Caitlin Kalinowski, who was the former head of Meta's AI glosses team, and has also worked on VR projects, uh, and also worked on MacBook hardware at Apple.

28:53

is now joining OpenAI seemingly to focus on robotics and partnerships to integrate it, uh, integrate AI into physical products.

29:03

We covered, uh, pretty recently how, uh, OpenAI did start recruiting for robotics positions with the descriptions of a job having to do with integrating charged GPT into robots.

29:15

We did see, uh, figure, the developer of a humanoid robot showcase their robot working with shared GPT, having conversations and being told to do stuff.

29:27

So perhaps this is, uh, this recruitment points to opening.

29:32

I want to Jeremie: do more of that. There's a lot of reading tea leaves, especially this week with open AI and it's hires, you know, there's a, so, so apparently one of the, part of the speculation in this article is

29:43

that Kalinowski, um, Is there to kind of to work with love from her old boss, Johnny Ive.

29:49

We've talked about Johnny Ive, um, partnering.

29:52

So he was the designer of course, of the iPhone.

29:54

Um, now he's been brought on board to open AI, uh, to launch as he put it, a product that uses AI to create a computing experience that is less socially disruptive than the iPhone.

30:05

Um, so I, I couldn't quite interpret.

30:07

What he was saying there is, was he saying it's going to be less, less horrible socially than the iPhone was, or it's going to be less of a

30:14

game changer than the iPhone was, um, probably he meant the former.

30:18

I'm not sure. But anyway, uh, so apparently she'll be back working with him.

30:22

So that's sort of a, a natural, a natural partnership there.

30:25

Um, she has a lot of experience doing design and Apple as well.

30:28

Really, really unhelpful, I will say, of OpenAI to have two separate media threads that involve the word Orion, because, uh, there's this model we'll talk about, right?

30:39

The speculative model, we talked about the spore of the model Orion, and now you have the former Orion lead from Meta, different things coming to OpenAI.

30:46

I really wish that they would keep their headlines a little bit, a little bit straighter, but.

30:50

Andrey: Yeah. And why, why Orion is a, you know, be a little more original.

30:54

Okay. In your, uh, project names also worth mentioning openly.

30:59

I did acquire a company building webcams earlier this year, I believe.

31:04

So could play into that. We don't know.

31:06

This is just, uh, we don't know what they're doing here.

31:09

Jeremie: It's, it's also an interesting about face, right? Cause like they, they did, they disbanded their entire robotics team.

31:13

This is like four years ago and now they're really rebuilding it.

31:16

But it does seem that the new robotics team.

31:18

Is a lot more, um, market focused, like product focused and so that in itself is, is sort of interesting.

31:25

You know, there are pros and cons there. They'll get a lot more real world feedback by having their systems out there and, and more interesting data.

31:31

But, um, yeah, anyway, so, uh, the structure of open AI continues to tilt towards more and more of a product oriented org.

31:38

And just Andrey: one last story on OpenAI.

31:41

This one is, I guess, a fun one as well.

31:44

Uh, and it is that OpenAI accidentally leaked access to the upcoming O1 model to anyone by going to a certain web address.

31:54

So this was accidentally leaked, uh, in the sense that users could access it by altering a URL.

32:02

for a brief period of time. It was shut down after two hours.

32:06

I think maybe when people were aware of it or something.

32:09

So we have the preview model of 01 that you can use, but still we don't have access to the full 01 version.

32:17

Now, yeah, people were able to play around with it.

32:21

Opening actually confirmed that this was the case, uh, and said that, uh, there was not too much access since this was resolved.

32:30

So people play around with it. And as you might expect, they'd say that it was pretty impressive.

32:36

Jeremie: Yeah. The opening, I at least said that they were preparing with a limited external access to the opening IO and model and ran into an issue.

32:44

So I guess in the process of trying to give people, you know, maybe special links to access it.

32:49

Um, it, it leaked in that way. I still think so.

32:52

So by the way, some of the demos are kind of interesting.

32:55

Uh, there's a classic one where, you know, you have this like image that is an image of a triangle and it's subdivided with a whole bunch of lines.

33:02

And then those lines form sub triangles within the image.

33:04

And then you ask how many triangles are there in the image?

33:07

Um, standard. Multimodal LLMs really struggle with this.

33:10

Uh, in fact, the preview version of 01 struggled with this and got the answer wrong.

33:15

The new version did not. Um, so, you know, one of these little things where, you know, maybe a bellwether, uh, eval or something like that, who knows.

33:22

Um, but I think one of the most interesting aspects of this, apart from the fact that it teaches us quite a bit about, um, opening eyes, continued struggles with security, it must be said.

33:32

Um, you know, this is, this is an organization that, uh, Explicitly has said that they are trying to prevent people from seeing the full reasoning traces of o1 because that is

33:41

critical intellectual property for them Well, guess what this this o1 version the full o1 version which was leaked to begin with also leaked out a full chain of thought When it

33:52

was asked to analyze in one case a picture of a recent SpaceX launch and then other other things in other other cases So for this sort of critical Um, uh, competitive secret, really.

34:03

And that's what it is. Uh, the reason opening, I didn't want to release that, uh, those chains of thought initially was precisely because they were concerned that those chain of chains of thought would, would be

34:14

really valuable training data for people to replicate what is so precious about this model series.

34:18

And so, you know, here they are kind of leaking it out themselves with this, uh, you know, haphazard, Uh, haphazard launch.

34:24

So doesn't really inspire a lot of confidence in open AI security approach.

34:28

Their philosophy, really frankly, the level of effort that they're putting, uh, into this.

34:32

I know it sounds like a small thing, but when you're dealing with, you know, the stakes as they may potentially present themselves in the future or national security, otherwise like.

34:40

This is not a small screw up, um, and it could have been mined.

34:44

If you imagine it's not an individual who's accessing this, it's an A.

34:47

I. Agent or something, and it's collecting, you know, using the opportunity to collect a bunch of training data, not saying you could

34:52

do a ton of it in that time, but this is an important vulnerability.

34:55

And, um, anyway, so, uh, kind of, uh, kind of amusing and a little disappointing, especially given that open AI has made such a big public, um, show of, of trying to get into the security game more.

35:08

Andrey: And just one little caveat with regards to a full chain of thoughts.

35:12

Uh, we don't know for sure if that's the case, uh, one Twitter user reported seeing it, uh, but that may or may not have been the full full.

35:23

It was just a detailed response that did.

35:25

Uh, include some of the reasoning

35:27

Jeremie: steps, so yeah, no, that's fair enough. It did look different enough.

35:31

Yeah, you're right. It did look materially different enough, um, from the sort of standard reasoning trace that's put out and similar enough to the one that the reasoning traces that were shared,

35:40

that opening I did share right when they launched that it's like very suspiciously like

35:45

Andrey: It seems like at least it's similar to what it's doing internally.

35:49

Yeah, yeah. And one last story, NVIDIA is once again even more valuable than before.

35:57

This time it is the largest company in the world.

36:02

It has surpassed Apple on Tuesday.

36:05

I don't know what happened on Tuesday, I guess. Find out.

36:09

So the shares rose at 2.

36:13

9%. Uh, leading to a market capitalization of 3.

36:18

43 trillion ahead of Apple at 3.

36:22

3 And for reference, Microsoft is at 3.

36:26

06. Uh, for reference, NVIDIA has gone up by, uh, more than 850 percent since the end of 2022.

36:38

So yeah, still an insane story of NVIDIA's

36:43

Jeremie: rise. It's sort of funny because it's, uh, like all my friends at the labs, like not to make it a whole stock story, but a very, very, uh, big wave of people who went in hard

36:53

on, on NVIDIA from the frontier labs, like in the sort of like, uh, 2021, 2022 era.

37:00

And, um, uh, you know, you think about the, the, the, the revenues they're making plowing it into NVIDIA and now that's kind of 10 X in value.

37:08

I, I, Yeah, there, the, anyway, there's a conviction about where this all might be going.

37:13

We're not giving stock advice on the show. Don't invest based on our stock advice.

37:17

Um, but, uh, yeah, certainly AI scaling has been good to NVIDIA.

37:21

Andrey: Yeah. I will say, I remember when I was in grad school, like in 2017, 2018, I was like, Oh wow.

37:27

NVIDIA is really doing good because of all of this deep learning stuff.

37:30

And here GPU is being a backbone of deep learning, which is a big thing in AI.

37:36

And. Even at the time I was like, I wish I had money to invest and it was not

37:39

Jeremie: a poor grad student. So, uh, well, and Jensen saw that in like 2014, 2013, right?

37:45

Like he, he has been positioning NVIDIA and the whole CUDA ecosystem for this for a long time.

37:51

And yeah, it's a pretty wild.

37:54

Andrey: Moving on to projects and open source.

37:56

The first story is about new research which we've covered a couple times and them launching a user facing chatbot.

38:05

So this group has previously released Hermes, specifically Hermes 370B in this case.

38:14

It's a variant of 1 and new, new, new research.

38:21

One of our big trademarks is these unrestricted models.

38:25

So having free access for all, doing completely unrestricted ability to track, so less safety, uh, This one, uh, the article writer at least did find that it did, uh, refuse to do certain

38:39

things like go into how to make drugs, although according to, uh, news, this is not from them.

38:47

So they didn't add any guardrails to this user facing chat bot.

38:52

Some of it was already baked into a model.

38:54

previously. Jeremie: Yeah, I do find this interesting.

38:57

Like, um, there's a certain, um, eagerness to do like fully, fully, uh, no guardrails.

39:05

Like, I don't think even like even XAI doesn't, um, uh, or sorry, even the platform X.

39:11

through grok and kind of XAI. Therefore, they don't pretend to like, be trying to do a fully no holds bar thing, right?

39:17

They're like, we will adhere to the law and, uh, and not produce things like, you know, child, child pornography or whatever else.

39:24

Um, so same, same things happening here. And, and noose is interesting because they are especially into this thesis in a, what I interpreted earlier is like a more extreme way.

39:34

Um, but here they're, they're basically saying like, Oh no, like, of course, of course we have safeguards on the actual model.

39:40

Like, of course we try to prevent it from, you know, from doing really, really bad things like helping you make illegal narcotics, like meth, like naturally.

39:48

Um, so anyway, the, the model as you'd expect has been jailbroken plenty, the prompter, um, very, uh, very quick on the case as usual, uh, finding

39:57

a really powerful exploit to basically get some through everything.

40:01

Um, you know, that's. It's only interesting.

40:03

I mean, we, we, I'd love to do a deep dive on plenty of the prompters methodology and, and approach.

40:08

Cause there's some fascinating stuff there, but, um, new, really interesting to note that they're even launching this, right?

40:14

This is not a new model. It is just a chat interface.

40:16

So they are trying to play in that space as well.

40:19

Um, yeah, so. We'll see where it goes.

40:22

I mean, I don't know if they're going to be charging for this stuff at some point or how that'll play out, but they are really into the, you know, make

40:27

it available for everybody up to and including training methodology, right?

40:30

We covered their distro optimizer a couple episodes ago that, um, anyway, it's meant to make it possible for like people to pull off massive training runs

40:39

distributed across basically the whole world between GPUs, that type thing.

40:43

So, uh, Andrey: anyway. That's right.

40:45

And this is, I suppose, part of a platform news chat.

40:49

So that's very much like charge your TV interface.

40:52

You log in, you have a text prompt window.

40:57

It has a fun kind of visual style to it.

41:00

A little more like. I don't know, old windows or a terminal.

41:04

It looks a little, I don't know, nerdy.

41:06

And one fun thing about it that is kind of interesting is you do have access to a system prompt and you can modify it directly, which is not the case.

41:17

with chat GPT. So just to read a bit, the system prompt that is here by default is your Hermes and AI to help humans build, create, flourish, and grow.

41:28

Your personality is empathetic, creative, intelligent, persistent, powerful, self confident, and authentic.

41:33

top adaptable. You communicate informally and in succinct responses that feel just like another human, et cetera, et cetera.

41:41

So, uh, I don't know, neat that they do provide access to that and you can configure it.

41:48

Next up, we got frontier math, a new benchmark.

41:53

So this one is crafted by over 60 expert mathematicians from top institutions and has original unpublished problems across various branches of modern mathematics,

42:07

meaning that you shouldn't be able to find it on the web and learn on it.

42:12

So compared to existing benchmarks like GSM 8k and math, which have simpler problems and do have the benchmarks out there with.

42:22

Here you have problems that require deep theoretical understanding and creativity.

42:29

And as a result, things like GPT 4, Gemini 1.

42:32

5 Pro struggle and solve less than 2 percent of the problem.

42:38

I believe that was a quote from Terence Tao, one of the people involved that this should be challenging for models for at least a year or at least a couple of years.

42:48

Jeremie: Yeah. And they've got an interesting framework that they, so it's not just the benchmark, right?

42:53

They're coming out with a whole evaluation framework that's all about automated verification of answers.

42:58

Part of that is to prevent, um, uh, guessing.

43:02

So they, they want to prevent LLMs from being able to succeed just by.

43:05

by kind of throwing out a guess and doing well.

43:08

Um, so they set up these, uh, these questions so that the, the responses are, uh, deliberately complex and non obvious like the, the correct answers, uh, to

43:17

get, to reduce the chances of, of guessing, getting you to where you want to go.

43:21

Um, they're also designed to be the kinds of problems that it wouldn't take.

43:25

Like, it's not just a question of, you know, I want it to take me a really long time to find the answer to this question, but I can do it through relatively straightforward reasoning, right?

43:34

So it's not like an undergraduate physics question, for example.

43:38

Um, it's also not like a, uh, some of the, the, you know, GPQA questions, like the graduate, um, uh, question answering questions, which sometimes you can answer in one shot.

43:50

Like without thinking you need to have the expertise, but if you have it, in some cases in that data set, you can just go ahead and respond without thinking too, too much.

43:58

They're trying to combine those two things together. They want it to be like really, really hard and also require hours, if not days, as they put it of human.

44:07

Uh, thought time to, uh, to solve for.

44:09

So you can really see, I mean, like everybody keeps saying this with new benchmarks, um, if a model can solve this, then, then it's going to be AGI, right?

44:18

The only AGI will be able to solve this. The problem is every time, you know, we keep seeing these new benchmarks come out, there keeps being a trick, you know, some way to make models that do really, really well at it.

44:27

Occasionally those tricks actually have broader implications for AGI kind of for the spillover in general knowledge.

44:33

Um, but, uh, but, you know, and that can happen quite often, but they, they certainly don't require the full kind of AGI that, uh, that some people think they might.

44:42

So this one, yes, we're at 2 percent right now, success rates for cutting edge language models like cloud 3.

44:48

5, sonnet, um, you know, Gemini 1. 5 pro, all that stuff.

44:51

Uh, but, um. Yeah.

44:53

Uh, unclear what's actually going to, going to get them there.

44:56

Is it a better agentic scaffold? Uh, is it a better trained foundation model?

45:00

You know, what is it? It's, it's, it's going to be interesting to see what actually ends up cracking this, this metric.

45:05

Andrey: Pretty impressive to see, or at least a sign of the times you could say that now people are developing these absurdly difficult things that most humans could even try.

45:15

Like they have some sample problems.

45:17

Uh, this one that is in the paper, they say is high million, to medium difficulty.

45:23

Just to read the problem, it's construct a degree 19 polynomial p of x in c of x such that x has at least, has at least three but not all linear irreducible components over x.

45:35

Choose p of x to be odd monic, have real coefficients and linear coefficient negative 19 and calculate p of 19.

45:44

So I don't know what that means. And we Uh, solution they provide in paper is like an entire page with a bunch of references to various, uh, theorems and so on.

45:57

So this is like hardcore math over here, and I suppose it's not surprising that current LLMs, uh, can't, uh, beat it just

46:05

Jeremie: yet. Yeah, and you can really see in that, that problem phrasing there, the, the layering on of sequential requirements that makes it harder to guess, right?

46:12

You can't just like one shot that, um, even with a guess, like you'd have to guess multiple things, right?

46:17

Which reduces the chances that you get a, an anom anomalous result.

46:21

So it's all meant to make it harder. Automated, automatically, uh, evaluate, evaluatable.

46:27

Geez, having a, having a hard time saying Andrey: the words.

46:29

And last up, we do have a new open source model.

46:34

This is a Hunyuan Large, an open source mixture of, uh, experts model with two, 52 billion activated parameters by Tencent.

46:44

So this has a 389 billion total parameters.

46:49

And it's pretty beefy and impressive.

46:52

So it can process up to 256 tokens, uh, 256, 000 tokens, and does beat LLAMA 3.

47:01

170b on various tasks like logical reasoning, language understanding, and coding.

47:08

Seems to be, uh, Somewhat comparable to LLAMA 3.

47:13

1 400 50, uh, 405 B.

47:17

So certainly seems like Tencent is trying to flex the muscle and showcase their ability to build this

47:26

Jeremie: scale of model. So one of the, one of the interesting things about this paper is, so they present a whole bunch of scaling laws and they share, you know, their, their thoughts about like, you know, how,

47:36

how many, um, tokens of text and, um, uh, of text data and how many parameters and all that.

47:42

So when you, when you do the math, at least by, by my math, um, uh, which, uh, Claude is very helpfully helping me with, uh, we get to a compute budget of about 10 to the 21 flops, right.

47:54

And compute budget is also something that, you know, It's good to be interested in when you see a Chinese model, because one of the things

48:00

that they're really constrained by is us export controls on hardware.

48:04

And they find it really hard to get their hands on enough hardware to train these models.

48:07

So here we have 10 of the 21 flops.

48:09

So for reference, when you think about a GPT four class model, a llama, a llama three, 400 B class model, you're looking at training budgets there of about 10 to the 25 flops.

48:20

So we're talking 10, 000 times bigger than.

48:24

Uh, is that right? Yeah, 10, 000 times bigger than this model in terms of compute budget.

48:28

So, so I find this really weird.

48:31

They claim that this model is on par with Lama 3 400 B.

48:35

I may be missing something in my calculations. If somebody like, if you can spot this, like please do.

48:40

Uh, this seems to me to be, Uh, very much stretching like this.

48:45

This seems very frankly, like implausible.

48:47

I must be missing something or the paper must be missing something.

48:50

But, um, if that is the compute budget, then, then they're doing something really janky, really weird.

48:56

And, uh, and that would be the headline, like if the actual budget was that, but again, um, yeah, llama 10, 000 times greater.

49:04

training budget. And, uh, and here they're, um, uh, they're saying that it, it performs on par with Llama 3.

49:10

1405B. So that doesn't make any sense to me.

49:13

Um, would, would love to, uh, Yeah,

49:16

Andrey: it seems that maybe there's a typo.

49:18

Maybe we didn't quite run the equation right.

49:22

They do say they trained for 7 trillion tokens and there are 59 billion activated parameters that would mean that it shouldn't be that.

49:32

different on that order of magnitude. So lots of details in the paper, they do talk about the architecture, the number of layers, the attention heads, the type of attention used, which is also the case with LLAMA.

49:44

So these kinds of details on the nitty gritty of how this is implemented always, I think is useful for pretty much everyone working on LLMs.

49:55

And on to research and advancements, we begin with some work from Google and some affiliates called Relaxed Recursive Transformers, Effective Parameter Sharing with Layerwise Laura.

50:09

So this is a pretty.

50:12

Novel, pretty interesting technique for getting more out of tiny models.

50:15

As we've seen, we've made more and more gains in the space of one and two billion parameter models.

50:22

And this one introduces the notion of recursive models.

50:26

What that means is they train, uh, like a vanilla transformer has N layers, right?

50:32

And each layer is distinct.

50:34

What we do in this paper is Say that you can take a set of layers and then basically stack them again and again.

50:42

So you have your P2 layers a few times in a row.

50:46

And just by doing that, you're able to, uh, still go to a small size, but retain the performance of a larger model.

50:56

And that's probably the title of the paper. The relaxed part there is that.

51:00

While they do repeat the, um, layers a few times, they still apply LoRa to differentiate, differentiate them slightly across layers.

51:12

So that, I think, is a neat little technique showcasing continued progress in the space of being able to really squeeze out all the performance out of, uh, less and less parameters.

51:26

Jeremie: Yeah, this is a really interesting paper for a lot of reasons, including the hardware interaction here.

51:30

But, um, for, for sort of intuition building, like I found this really weird when I read it, to be honest, I was like, how I wasn't familiar with the literature around.

51:39

Cause there is some, um, around, I guess what they're calling recursive transformers.

51:43

People have done some little experiments. Right. And then

51:45

Andrey: actually just to call this out, uh, it might be confusing.

51:49

So recursive. Going back a little while, like there has been research on recursive different from recurrent.

51:57

So recursive is different because you're not kind of updating a hidden state.

52:03

There's no like, uh, time sequence element here.

52:06

It's really just, you have one input and you pass it through the same.

52:12

neural network several times to get to a better answer.

52:16

So you take an input, you pass it for a way to get an output, you put that output back through the same set of weights.

52:23

And that's what it means to be recursive. And yeah, it's been out for a little while that it actually is possible to train neural nets to be better after several recursive passes, several passes through itself.

52:35

And, uh, yeah, I'll let you, Jeremy, take over.

52:38

Jeremie: Yeah, no, but, but that fact itself, right?

52:41

That, that's something that I was not aware of going in myself.

52:44

And, and it's, it struck me as quite counterintuitive, right?

52:47

You, you, you feed the same data as you put data into a model, a layer one, and then you make it go through layer one.

52:54

And then instead of going to layer two, you may go back through layer one and over and over and over again.

53:00

And you get a better result out. And. You know, I was trying to build some intuition around this.

53:04

Best I could tell is like, so reading a book twice, right?

53:08

You're, you're kind of doing the same thing, even though you're using the same, um, algorithm that are the same, the same layers and all that,

53:15

uh, you're able to extract more and more information with each pass.

53:19

And so this is essentially the same principle.

53:22

Basically you're chewing on the data more. Uh, you can think of it.

53:25

As a way of just expending more compute to in the process of chewing on that data.

53:28

If, if you want to compare it to just like, uh, feeding it through just the layer one time, now feed it through multiple times, you get it, you get a better result.

53:35

Um, so one of the challenges is, sorry, let's talk about the advantages.

53:40

First, the advantage is, uh, you are copy pasting the same layer over and over, which means you don't need to load a, you know, Uh, I don't know, uh, uh, an 8 billion parameter model,

53:51

maybe you get to load a 4 billion parameter model if you reuse every other layer, right?

53:56

Or, um, uh, anyway, you know, you can, you can keep playing games like that, where do you, you have layer stack like three times in a row, the same layer, and then.

54:03

A different, you know, the next layer and copy that one three times in a row.

54:07

Or it could be all the same layer. There are all those configurations that are possible.

54:10

And so, um, one of the advantages here is it cuts down on the amount of memory that you need to use on your chip, right?

54:17

This is really good for memory usage. Um, Still need to run the same number of computations, though, even though your layers are identical, your weights and parameters are identical,

54:27

your data as it's, you know, the embeddings of that data are changing.

54:32

So you still have to run those calculations.

54:34

So the logic cost, the number of flops, the flop capacity of your hardware is still, you know, needs to be utilized intensely.

54:42

There is a way that you can Even get an advantage on that level, though, um, because so much of your, your computation looks the same, it makes it easier to paralyze it.

54:52

So they, they have a section of the paper on continuous depth wide batching, where they're talking about, okay, how can we leverage the fact that the layers are identical to make the actual logic.

55:03

Less demanding on the chip, which is, which is really cool.

55:06

But the, the really big boon here is for memory usage.

55:09

Cause you're literally, you're functionally cutting down on the size of your model, right?

55:11

Pretty dramatically. Um, so that's, that's really cool.

55:15

It's, it's such a dead simple method. There is this technique that they're using.

55:19

Uh, that seems to have worked best in terms of deciding which layers to copy paste, uh, that they call their stepwise method.

55:25

This was the, the one that worked best. So basically they would take, if you have a, I don't know, like a 20 layer transformer, um, they would take every other layer and copy it once.

55:35

So, you know, take layer one, Um, repeat layer one, one time, right?

55:39

Then take layer three, which would be the next one.

55:42

Cause you layer one, layer one, then layer three, layer three, then layer five, layer five, layer seven, layer seven, all the way up to 20.

55:48

And that's kind of the thing that they found worked best.

55:51

Uh, the intuition behind that just being that, hey, there were like, There was prior work that showed that this worked.

55:57

So a lot of this is just sort of like janky engineering, but still a really interesting way to kind of play with again, play with hardware, see what can we do with chips that have crappy

56:07

memory, but maybe good logic, you know, unclear, like which chips would would necessarily fit into the category once they use this continuous depth wide patching, batching strategy, but,

56:18

but really interesting and a great way to get more out of your, yeah, out of your AI hardware.

56:23

Andrey: This paper has quite a bit, uh, to it, a lot of details that are interesting.

56:28

So they do use a step wide strategy, uh, initially, but when they add this other trick of Laura for these layers to be able to adopt them slightly, uh, for different layers,

56:40

uh, they do a slight modification to a step wise method where I average two layers.

56:46

So like layer one is an average of layer one and four, and then the other one is the average of 2.

56:51

5. Just empirically, they found this work better and you do need to, uh, they say up train.

56:57

So you need to train, uh, an initialized model for a little while to get it to work well, but they do say that you need to, you don't need to train it very much.

57:07

You have just like a basic 15 billion tokens of up training, a recursive Gemma one model outperforms.

57:14

Even full size pre trained other models like by FIA and tiny llama.

57:20

So yeah, it's, it's quite interesting and we'll be seeing, I guess, if this gets adopted in practice.

57:28

And, Jeremie: and, um, I don't know if we talked about the, the Laura adapter role in this kind of conceptual structure, but maybe just worth emphasizing when you lock

57:38

in, right, these parameters and you're just repeating the same layer over and over.

57:42

Um, you, you might want to give your model a little bit more.

57:46

Degrees of freedom than that, a little bit more of an ability to kind of adapt to the new problem domain that you're going to be training it on.

57:53

And that's really where the lower adapters come in.

57:55

It gives the model a little bit more room to stretch itself, right?

57:58

Hence the relaxed, uh, um, qualifier here and relaxed recursive transformers.

58:03

You're giving it a few more degrees of freedom to, uh, to kind of modify itself without that constraint of all these layers have to be the exact same.

58:10

So that's kind of the intuition. Andrey: Yeah, right. So Laura also, uh, for some reference is a way to, uh, like say efficiently change a bunch of weights by tweaking a smaller set of parameters.

58:23

you could basically reduce it to. So that's the idea here is you're not updating, you're still sharing most of the weights, but you update a few parameters that make them a little more distinct.

58:35

And onto the next research, we got applying golden gate clod mechanistic interoperability technique to protein language models.

58:45

And this is, Not a paper, actually, this is more of an open source project that looked into the idea of, uh, applying the same technique that, uh, we've

58:56

covered, I believe like now a few months ago, where we had sparse, uh, how to encoders where that can be applied to LLMs to get interpretable features.

59:08

So you can say, uh, the, the famous example, I guess is the Golden Gate Bridge.

59:14

feature in Cloud, you can see that there is this kind of notion concept within Cloud that gets activated for certain inputs.

59:24

And that is done via the sparse autoencoder technique that compresses the outputs of certain layers in LLM and then finds regularities at a high level.

59:35

So this work was applying that same technique to to, uh, a model specialized in protein prediction, I guess, protein language models.

59:46

And, uh, they found some interesting features, uh, in this context.

59:50

And I think, uh, Jeremy, you read more into it, so I'll let you go ahead and take over.

59:55

Jeremie: I mean, I, I really like this, um, uh, this paper and, and for, for context to like the, the SAE, the sparse autoencoder is a bit of a darling of the AI

1:00:03

interpretability world, especially among folks who care about loss of control scenarios.

1:00:08

And like, is my AI trying to, um, plot against me trying to scheme as the, believe it or not the technical term is.

1:00:14

Um, so the, uh, the idea here is, yeah, you have somewhere, so let's pick a middle layer of our transformer.

1:00:21

And we'll pick specifically the residual stream.

1:00:23

So residual stream is basically the, the part of the circuit in the, in the, I shouldn't say circuit, the, the, the part of the, the architecture that takes, um, whatever

1:00:33

the, the weights were from the previous layer, just copy paste them into the next one.

1:00:37

It's a way of preventing the information from degrading as it gets propagated through the model.

1:00:42

But anyway, uh, essentially pick a slice, uh, of, of your transformer and, uh, you, you feed the model some kind of input and you're going to get activations at that layer, right?

1:00:52

Now pick those activations and use them.

1:00:56

As the input, okay, you're going to feed them to a model called their sparse autoencoder.

1:01:01

The sparse autoencoder is going to take those activations and it's going to have to represent them using a small set of numbers, like a compressed representation.

1:01:10

So, you know, maybe you have, uh, well, as a cartoonish version of this, say you have 10, 000 activations.

1:01:16

Um, then you want to compress them down to like 100 dimensional vector, right?

1:01:20

So that's what the sparse auto encoder is doing. It compresses them.

1:01:23

And then from that compressed representation, it's then going to decompress them and try to reconstruct the original activations.

1:01:30

And the loss function it uses usually something like the, the difference between the old and the reconstructed, the true and the reconstructed activations.

1:01:37

So basically it just gets really good at compressing these activations down to a smaller number.

1:01:43

representation. It turns out, and Anthropic found this, that when you do that, uh, the, the individual entries in that smaller compressed representation end up correlating to human interpretable features.

1:01:56

Uh, so for example, like the idea of deception might be captured by one or a small number of those numbers.

1:02:04

Um, the idea of, of a molecule might be captured, you know, in, in the same way.

1:02:08

And so this is basically just meant to be. A way of taking this very complicated thing, all the activations in this residual stream and compressing 'em down to a manageable number of, of numbers that we can actually

1:02:21

get our arms around and start to interrogate and understand and interpret, right?

1:02:25

So that's kind of part of that, that hope of the alignment game plan is like we'll be able to use this to understand the thinking in real time of ais that are very potentially dangerously advanced.

1:02:33

That's the theory. Um, a lot of interesting success has been found there, including on steering the model's behavior.

1:02:39

So if we do something called clamping, we pick one of those rep, one of those numbers and that compressed representation.

1:02:45

And let's, let's say it's the number that represents banana or encodes the idea of banana.

1:02:50

We crank up its value artificially.

1:02:53

And then we reconstruct the activations.

1:02:55

We can then get the model, uh, to based on those activations to generate outputs that are tilted towards banana, whatever that means.

1:03:02

Maybe it talks a lot about bananas or something like that.

1:03:04

That was the golden gate, uh, Claude, uh, experiment, right?

1:03:07

So they found the entry that corresponded to the golden gate bridge.

1:03:11

They clamped it to give it a really high value.

1:03:13

And then that caused the model to yap on about the golden gate bridge.

1:03:16

So here, the question is going to be, will we find the same thing if we work on Uh, transformers that are trained on bio sequence data and they pick a model that was developed by this

1:03:27

company ESM, sorry, sorry, this company Evo scale that's made the ESM series of models.

1:03:32

So we covered ESM three many months back, fascinating model.

1:03:37

Uh, it was the first ever bio sequence model, by the way, to meet the threshold of reporting requirements under Biden's executive order back then.

1:03:45

Um, so it was a really, really big model. What they did was they took a smaller model, ESM two that that company had built, and they played the same game.

1:03:51

Can we pick a, you know, a middle layer of, of that transformer?

1:03:55

Um, you know, build a sparse autoencoder and can we recover human interpretable features, right?

1:04:02

Can we find features that correlate with, in this case, uh, common structural components or facets of biomolecules?

1:04:10

A common example here would be like the alpha helix.

1:04:13

So if you put proteins together, um, certain kinds of, sorry, if you put amino acids together, certain kinds of amino acids, when you string together to form a

1:04:20

protein, They'll they'll tend to form a helical structure called an alpha helix.

1:04:26

The other, um, secondary structure that they sometimes form is called a beta sheet or beta pleated sheet or whatever.

1:04:32

There are all these different, um, structures that these things will form depending on the kinds of Lego blocks, the kinds of amino acids that you string together.

1:04:40

They all have slightly different charges. So they attract and repel in these nuanced ways.

1:04:43

And it's notoriously hard to predict what the actual structure is going to be, but Well, here using this technique, they're able to find, okay, we actually have in our SAE, in that reduced

1:04:53

representation, we have some numbers that correlate with, oh, this is going to be an, there's going to be an alpha helix here, a lot of alpha helices or, you know, beta hairpins or whatever else.

1:05:03

And so that's interesting from an interpretability standpoint, we can understand a little bit more what goes into making these, uh, these proteins take the do, but then they also found

1:05:14

that by modifying the uh, values in that compressed representation by doing this clamping thing and artificially, you know, let's say we enlarge the value of the alpha helix, um, number,

1:05:25

you could actually prompt the model to output a sequences that would have more alpha helices.

1:05:31

And so this is kind of interesting from a protein design standpoint, right?

1:05:35

It's the first kind of tantalizing hint. Uh, well, maybe not the first, but you know, bucket it with, uh, alpha fold as a series of tools that could allow us to better understand how proteins

1:05:45

fold and actually come up with designer proteins with certain structural characteristics that otherwise would be really, really hard to, uh, to design.

1:05:53

Andrey: And onto the lightning round, we begin with a pretty fun blog post from nap time to big sleep using large language models to catch vulnerabilities in real world code.

1:06:06

So this is by Google's Project Zero.

1:06:09

And this is a team that's been around for a while since 2014 dedicated to finding so called zero They vulnerability.

1:06:16

So well, nobody's in code that aren't yet known or out in the wild that hackers can then exploit without there being protections for it.

1:06:26

They have previously had this project nap time.

1:06:30

Evaluating offensive security capabilities of large language models.

1:06:34

They had this blog post several months ago where they introduced this framework of large language model assisted vulnerability research and demonstrated the potential

1:06:45

for improving state of the art performance on the CyberSec eval2 benchmarks from Meta.

1:06:52

That was a little while ago and now nap time has evolved to big sleep where Google Project Zero is collaborating with Google DeepMind.

1:07:01

And in this blog post, they announced a pretty exciting result from this big sleep agent, this LLM that's optimized for helping with, uh, I guess, vulnerability

1:07:13

detection, they discovered a vulnerability via this agent, an unknown, real vulnerability in a major project, SQL Lite, and reported it and the developers fixed it.

1:07:27

So to their knowledge, this is the first time an AI has been used to find a real world vulnerability like that.

1:07:35

And this blog post goes into a whole lot of detail into a vulnerability that seems to be.

1:07:40

you know, somewhat tricky case, not, uh, uh, some sort of trivial discovery, so to speak.

1:07:47

So very exciting for implications of being able to fight

1:07:50

Jeremie: hackers with AI. Yeah.

1:07:52

And also a warning shot that, Hey, these things can actually, AI can now discover real world vulnerabilities.

1:07:59

It's always a double edged sword with these things, but, um, yeah, it's, and that's been a big, uh, question mark, right.

1:08:05

In the debate over Over AI and what risks it might pose is, you know, I've had debates with people or, you know, they'll, they'll say, well, you know, we haven't seen an

1:08:13

AI system actually successfully discover cyber vulnerabilities in real world systems.

1:08:17

And so therefore, et cetera, um, now that we have, I mean, I, I wonder, I wonder what the implications may be, but there've been.

1:08:24

Pilot studies, we've talked about a couple, you know, finding first, it was one day vulnerabilities where the exploit has already been in logged somewhere.

1:08:31

And now you're just getting an AI agent to exploit it. And then zero days, which is, you know, really figuring out without knowing whether there is even a vulnerability, finding one from scratch in kind of more toy settings.

1:08:42

This is the real world though. This is finding one in a, I mean, SQLite is a very, very popular library.

1:08:48

Um, and this is an interesting, uh, an interesting bug, an interesting exploit.

1:08:52

It's a null pointer dereference, which essentially is you have a pointer that points to memory addresses, and this vulnerability allows you to control what it points to.

1:09:03

And so this allows you essentially to have some control over what gets written or read to memory, and that could could in principle, allow the attacker to pull off arbitrary code execution.

1:09:16

And, um, essentially, you know, if you, if you just point the, the pointer to, um, some specific, you know, like buffer space or, or some adjacent memory, you may be able to

1:09:26

actually like draw that, you know, pull that data in and use it for whatever purposes.

1:09:30

So there are a lot besides that, there's just like making the application crash, right?

1:09:34

You just have to have a like fucked up pointer or something, and it just won't work.

1:09:37

Um, so all that kind of interesting, Uh, the, they go into how this thing works and it is, I think, quite a, quite an interesting improvement over current techniques, like the best techniques we

1:09:49

have right now, which include things like fuzzing, where you basically just throw everything in the kitchen sink at your application, at your, at your software and just like see, uh, If anything

1:09:57

breaks, um, this is a much smarter approach, obviously, uh, powered by a thinking AI system.

1:10:03

Um, so yeah, pretty cool. Um, and, uh, this was by the way, a bug that did remain undiscovered after 150 CPU hours of fuzzing.

1:10:12

So people had tried the standard techniques on this many times over.

1:10:16

Um, makes sense. It is a popular library. But, uh, but those techniques failed, whereas this, uh, AI power run succeeded.

1:10:23

Andrey: And one more story in this section.

1:10:25

This one, not about progress, but rather lack of progress and some unknown research.

1:10:32

Uh, so it's about OpenAI. There's been a report from the information about them.

1:10:36

We're probably working on new strategies to deal with an AI improvement slowdown.

1:10:43

So OpenAI has been working on something like a GPT 5, uh, the upcoming model has been codenamed Orion, and there you go.

1:10:51

That's the reference to Orion from before.

1:10:54

And the report is saying that it seems to not be showing as significant an improvement over predecessors as in previous iterations.

1:11:02

So in the leap from GPT 3 to GPT 4, there was a massive improvement.

1:11:08

GPT 3 was pretty impressive. GPT 4 was much more impressive.

1:11:13

And GPT 4 now was, oh, I don't know, like what, two years, two years old.

1:11:19

No, a year and a half old. It's been a while since GPT 4 came out, and we haven't had that kind of leap since, uh, except maybe you could argue with, uh, O1, with the introduction

1:11:32

of inference time compute, we saw some pretty significant qualitative gains.

1:11:38

Regardless, This report from the information is saying that the sort of commonly used the standard trick of more data, more compute, more scale may not be as effective as it was previously.

1:11:54

So, uh, of course we are having a scarcity of new training data.

1:11:58

That's one of the issues is most of the internet has already been sucked up.

1:12:03

And the surveyors reportedly of this new foundation.

1:12:06

team within OpenAI looking at possible alternatives to just scaling, like doing most, uh, more in the post training phase, uh, doing more with synthetic data from AI models, et cetera.

1:12:20

Now OpenAI has, uh, not commented on this and has previously said they have no plans to release Orion or anything like GP5 this year.

1:12:30

So I guess it can take a grain of salt, but also maybe not super surprising.

1:12:35

Jeremie: Yeah, I think this is such an interesting part of the debate and the question over scaling, right?

1:12:40

So there's, there's a question as to whether, so when we look at scaling curves, what we're typically talking about is how well does roughly the models

1:12:49

next word prediction accuracy improve with more data, more compute, right?

1:12:53

And model size? Um, What the challenge is, that doesn't tell you how that improvement in next word.

1:13:00

Sorry. Next word prediction accuracy does not necessarily tell you how generally useful a model is, you know, how good it is at reasoning other things that we might actually care about.

1:13:09

And so you've got this very robust scaling law that tells you that the model is getting better predicting next tokens.

1:13:15

Um, but, but then uncertainty about the, the value that's being created in that process.

1:13:19

So that's one dimension of uncertainty without knowing what's going on with Orion, what the training data looks like, what it's intended to do.

1:13:25

Like, is this another reasoning system? It seems like it's not supposed to be, but there's a lot of fog of war here.

1:13:32

Um, without knowing that it's hard to know whether that what I've just described here is an important part of the uncertainty or whether it's, you know, like a

1:13:40

reasoning model and the inference stuff isn't working out from what I've seen, it seems like it is more likely to be the former that this is really meant to be

1:13:47

a beefy pre trained, you know, GPD five type model, as opposed to a one, which was, you know, putting, I mean, I really don't want to say bells and whistles.

1:13:55

It's way more than that, but, uh, it certainly is.

1:13:58

leaning more towards the inference time paradigm.

1:14:01

And that was, that, that's the big leap there. You know, we have separate inference, inference time scaling laws now to, to ride as well that compliment the training time scaling laws.

1:14:09

So that may well be enough to do some really interesting things.

1:14:12

But, um, yeah, there, there's a whole bunch of interesting, you know, gossip about open AI in here, apparently back when, um, 20 percent of its training run.

1:14:21

Um, Sam was really excited about it and was talking internally about how this would be a big deal.

1:14:26

That's where he hyped it up. It seems that that hype has failed to materialize.

1:14:29

And that's really what's kind of an issue here.

1:14:31

Um, there's also questions about what hardware this stuff is being trained on.

1:14:35

Like, what is this training run? I'm guessing it's the H 100 fleet that opening eyes running right now to train this.

1:14:41

Um, at what scale, like, what are they really pushing in terms of scale?

1:14:44

We don't know. Really hard to know. Um, and just more generally, so they, because they are setting up this foundations team to explore kind of deeper questions.

1:14:53

Now, you know, if, if the default path to scaling the engineering path, we'll call it right where you just, you know, build faster horses.

1:15:00

If, if that doesn't, doesn't work, what do we do instead?

1:15:03

Right. That's the big question. I think in this instance, um, Open AI is really and quite ironically put itself in a, in a difficult position over the last few years.

1:15:13

Right. They've bled off. I think it's fair to say all of their best or not all, much of their best algorithmic design talent.

1:15:20

Right. So Ilya Setskever has left.

1:15:23

Um, we've seen, you know, the safety team, Jan Laika, uh, we, uh, we've seen, um, Yeah.

1:15:28

Anyway, like basically a huge, huge amount of talent, including product talent.

1:15:32

We had, uh, Barrett Zoff leave recently too.

1:15:35

There's like really, really good folks who are gone in many cases to anthropic.

1:15:39

And so if it is the case that we're moving from a domain where it's about exploiting a paradigm, um, in other words, doing really good engineering and getting, uh, scaling to work really well.

1:15:50

To a paradigm where we're looking for new ideas instead, where that's the main bottleneck.

1:15:55

Then you might anticipate talent being the main limiting factor, in which case anthropic starts to look really interesting, right?

1:16:01

You've got a lot of companies now that could be, that could be competing here.

1:16:05

Meanwhile, opening eyes hamstrung by a relationship with Microsoft that is currently tepid at best in the recent investor communications.

1:16:12

Microsoft did not refer to opening eye in the future tense at all, right?

1:16:16

That is a big, big change. Um, so.

1:16:19

As that starts to happen is opening eyes forced to work with companies like Oracle to develop infrastructure because Microsoft apparently isn't meeting their needs.

1:16:26

There's tension there too. Like this starts to become really interesting for Sam and he's got to find a way to square this circle.

1:16:33

He's got to find a way to keep raising money. He's got to find a way to keep scaling for what, what that's worth.

1:16:38

And then he's got to retain talent. Um, it, it would be interesting if this turned into a very significant, uh, Structural challenge for open AI, if they've, if they've doubled

1:16:47

down too much on scaling, but again, this is all speculative.

1:16:50

We don't know until the models start dropping.

1:16:52

And frankly, I think when the, the, uh, Blackwell series of GPUs come online, we get those big clusters running next year.

1:16:59

I mean, look, everybody I know in the space really expects big performance improvements from the early tests they're doing.

1:17:04

I suspect we'll be, we'll be looking back on scaling as like, yep, that, that was a real thing all along, but, um, but if not the implications for open AI, at least are interesting.

1:17:14

Andrey: That's right. And also worth noting, this is not sort of unique to open AI, right?

1:17:18

It's an open question in general, if it is even doable to scale in part, because of training data running out, that was a speculation for a while and just, just paint a high level picture, right?

1:17:32

What scaling means is GPT 3 was like around 180 billion parameters, GPT 4 we don't know, but the speculation or the rumors where it was like around

1:17:43

closer to 2 trillion total parameters, but it was a mixture of experts models.

1:17:47

So there was. Some smaller set of activator parameters, uh, and so GPT 5 or whatever the next model is, Orion, you could say maybe would have 10 trillion total parameters or 20 trillion, you know, that kind of jump

1:18:02

in size and speculation is, well, If you do that same kind of move from GPT 3 to GPT 4, GPT 2 to GPT 3, and just add more weights, add more scale, add more training, will you get that bigger jump?

1:18:15

Right now, it's unclear, right, and this report is basically claiming, or seems to claim, that maybe it's not quite as successful as it has been in the past, but it remains to be seen.

1:18:28

Jeremie: Yeah, I think, uh, you know, worth noting, though, on the data scarcity side, like, There is eventually a data wall, presumably, unless synthetic data carries us over.

1:18:37

The data wall, though, is expected to kick in about an order of magnitude of flops, like training compute further than, for example, like power constraints.

1:18:46

Um, so And right now we're not even close to power constraints in our current runs.

1:18:51

Like we're, we're seeing, you know, 10 to the 26 flop runs next year, probably shading into 10 to the 27.

1:18:57

Um, that's still like two orders of magnitude before you hit even the power constraints on the grid.

1:19:02

So right now I don't think that that the data scarcity is actually the thing driving the, the limited capabilities here.

1:19:11

I think there's something sort of. Something else is going on here and we'll, we'll presumably have to wait and see.

1:19:18

Um, that's part of the reason why I'm curious, you know, what happens that next beat when we get the Blackwell clusters online, when we start to see

1:19:24

the a hundred thousand, uh, you know, GB 200 GPU clusters running, like.

1:19:28

Do you then see the transcendence to use the Saudi Arabian terminology for this?

1:19:35

Do you start to see that kind of improvement? I don't know.

1:19:38

But, uh, I think it's, yeah, there's a lot of experimentation with many billions of dollars that are, that will be run, uh, to find out.

1:19:46

Andrey: That is being run. Yes. You know, this is a big question and I guess we'll find out.

1:19:52

Alrighty, moving on to policy and safety, and as promised, we are going to talk about Donald Trump's victory for the presidential election in the U.

1:20:02

S., and in particular, what it means for AI.

1:20:05

No political commentary from us, even though as a U.

1:20:08

S. citizen, I have some opinions.

1:20:11

But regardless, Donald Trump is going to return to the White House.

1:20:14

And there are, there's not a ton we know about specifics of what might happen, but we do have some pretty decent ideas as to at least some of what will happen.

1:20:25

So for instance, we do know Trump's administration is presumably going to repeal president Biden's executive order on AI, which we've covered plenty on the podcast.

1:20:36

This is a very big, Uh, order, not a law.

1:20:40

So, uh, the Trump administration, because this was just an executive order, the Trump administration could just cancel it more or less.

1:20:48

Now, there might be, uh, retention of some of the features of that.

1:20:54

It might be revising it rather than fully canceling it.

1:20:57

Uh, but it does seem likely, at least.

1:20:59

We don't know for sure. Certainly there'll be revisions to that.

1:21:04

And then of course, we know that Trump loves to fight with China and that's been an ongoing situation in the US for a while, so there are probably more be

1:21:14

more about, but Jeremy, you're, you're a policy guy, so I'll let you do more.

1:21:18

So we're talking here. Jeremie: Yeah. I mean, I used to think of myself as a tech guy, I guess.

1:21:22

Uh, yeah, I guess half, half and half a bad.

1:21:26

Yeah. No, look, I think, um, it's funny because in the, so, so the, the policy universe I live in is, is the national security policy universe.

1:21:34

Um, to the extent that I live in the policy universe. And, um, I think that there are a lot of people in the kind of, uh, general AI safety world who, who are really concerned about a Trump administration.

1:21:45

And I actually think that a lot of those concerns are quite misplaced.

1:21:48

Like, I think this is a misreading of, um, of what we need and where we are.

1:21:54

So just for, for context, we've seen Trump on various podcasts.

1:21:58

That's all we have to go on, by the way. And this article, uh, goes in depth into comments that Trump's made on.

1:22:04

There's Andrey: been no promises, uh, no guarantees.

1:22:06

So this is kind of reading two leaves and guessing based on various comments.

1:22:10

Jeremie: Yeah, exactly. Exactly. And so you've got, you know, Trump is, is rightly described in my opinion, AI as a superpower and cause it called its capabilities alarming.

1:22:20

Um, he's also referred to China as the primary threat in the race to build advanced AI, which I think is also correct.

1:22:25

Um, and, uh, and then you've got this interesting question as to, you know, cabinet staffing, like Elon is a massive influence in the cabinet.

1:22:33

And to the extent that that, or I should say, sorry, on the transition team and broadly on the team, I don't know that he'll be in the Cabinet officially.

1:22:40

'cause he's kind of busy with a lot of companies, . Um, but he, you know, obviously a massive influence.

1:22:45

Very concerned about. Everything from weaponization to loss of control.

1:22:49

A lot of, uh, good quotes from Dan Hendricks in this article who advises Elon quite a bit.

1:22:54

And, um, then the question is that's Musk.

1:22:57

Uh, that's Elon. You've got Vance on the other side, Trump's VP, obviously, um, who's expressed concerns in the past over closed source AI entrenching the tech incumbents.

1:23:06

Now, I mean, I think this is a, it's a very rational concern to have, right?

1:23:11

Like you don't want closed source, uh, Um, uh, you know, pure plays and not allow people to open source stuff.

1:23:18

I think that, you know, that is going to start to change inevitably, uh, as, as you start to see open source models actually getting weaponized.

1:23:24

And it's just going to become super obvious to all concerned.

1:23:26

And at that point, the administration clearly is preserving their optionality to, to, to, uh, Go in that direction.

1:23:32

Um, at the time it was some big questions here remain around the AI safety Institute.

1:23:36

For example, uh, that was sort of a, a spawn off of the executive order, at least the kind of, a lot of the bones were laid there.

1:23:43

Um, interesting question as to, as to whether that remains, it is the case that most Republicans do support the AZ.

1:23:49

Um, it's, it's a, a part of the broader American strategy on AI and it's a, certainly a home for expertise.

1:23:56

Question as to whether Paul Cristiano continues to run it.

1:23:58

You know, that's another degree of freedom they have. They keep the AZ, but swap out, uh, Paul Cristiano, who, you know, the former head of alignment at OpenAI, who invented reinforcement learning from human feedback.

1:24:07

Um, so that, that would be an interesting, an interesting question.

1:24:11

Um, but then more broadly, the executive order, right?

1:24:13

The famous Biden executive order, 110 pages, it was the longest EO in living memory.

1:24:18

You know, I think there are a lot of components there that are, uh, problematic.

1:24:22

likely to be preserved in a, in a Trump administration.

1:24:25

Um, I think you'll see some stuff get scrapped.

1:24:27

Look, that EO did tons of stuff. It talked about, you know, bias and civil rights and all kinds of stuff under the banner of AI.

1:24:35

Um, I think you could well see that get, you know, carved out, hollowed out, you know, the, you know, Trump has said, he's going to rip out the EO.

1:24:41

That's not a, That will probably happen, but what it gets replaced with is really at issue here, right?

1:24:46

How much of the national security stuff gets preserved?

1:24:49

I wouldn't be surprised if we end up seeing an awful lot of that stuff still in there.

1:24:53

Um, and, uh, anyway, there's all kinds of questions as well about, uh, you know, what do we do on the energy infrastructure side?

1:25:00

What We have a ton of work in the United States to do, to get energy back on the table.

1:25:05

Like we have forgotten how to build nuclear plants.

1:25:08

We can't build them faster than 10 years, right?

1:25:10

We like, we need a way to keep up.

1:25:12

We just talked about the, the power bottleneck and how that kicks in at about 10 to 29 flops.

1:25:17

Well, that's coming. Like that's, that's the training runs of like two, three years from now.

1:25:22

Um, if it takes you 10 years to build a nuclear plant, then like.

1:25:25

You've got to change something pretty fundamental. We need to get natural gas online.

1:25:28

We need to get geothermal potentially, um, and a lot of these things align with the kind of Trumpian school.

1:25:33

So making sure AI gets built here, the questions are all going to be around, you know, what about things like loss of control?

1:25:40

What about things like weaponization and open source? Those are the big question marks.

1:25:43

And right now, again, like it's an administration that's positioned itself very, very, very openly, very flexibly.

1:25:49

Um, and, uh, and you know, the, the China angle I think is, is a very bipartisan piece too, right?

1:25:56

So I don't think we're going to see all the export controls that have put in, get ripped out.

1:25:59

I think those are actually going to be bipartisan and maintained where we might see a change, um, would be the Trump administration, maybe focusing more on enforcement.

1:26:08

Right. We've covered a lot, the leakiness of these export controls under the current administration would be great to see, you know, actual loophole closing as fast as loopholes are opening.

1:26:18

And that's something you could see. So, um, you know, one last kind of meta note here, the uncertainty that we see.

1:26:24

Around what might come out of the Trump administration here reflects uncertainty in the technology, but it also reflects the classic kind of Trumpian move

1:26:33

of maintaining uncertainty for the purpose of negotiation leverage, right?

1:26:36

You see this with tariffs and all the discussion around that the threat has to be credible so that it actually leads to leverage internationally is something that we've seen.

1:26:45

other administrations anyway, struggle with is like, if it's, if, if you're speaking softly and you're not carrying a big stick, then people will not take you seriously.

1:26:54

And to the extent that there's a lot of negotiation to do with China on this issue, you, you may actually want to negotiate from a position of strength.

1:27:01

And for that, you need to have the energy leverage and, uh, and other things.

1:27:05

So I think big, big questions around the AZ Big, big questions around what the focus is on open source and, um, and on a loss of control.

1:27:13

But with Elon there, um, I think there's, uh, there's a lot of, uh, room for, uh, for positive stuff to happen potentially on the, the safety side.

1:27:21

So yeah, I think the, the, the story is again, much more positive.

1:27:25

A lot of the people who I know in the kind of AI safety world, um, seem, seem much more concerned about this.

1:27:32

And I, I think part of that, you know, It may just reflect a, a concern over, you know, frankly, politics, like some people are just, they just

1:27:41

don't want, they don't want this administration and that's part of it.

1:27:44

But, um, right now it's unclear and, and the, you know, we just got to wait and see.

1:27:48

I think there's some really good policies that have been put forward, uh, generally on the energy side and elsewhere.

1:27:53

So wait and see is, is the best approach probably.

1:27:56

Andrey: Right. Yeah, that's generally my impression. Also, this article goes into and basically does lay out the picture of that.

1:28:03

It doesn't seem like there's any obvious big overturnings of what's been happening.

1:28:08

There's going to be a lot of tweaks, presumably. Similarly, with the CHIPS Act, which was one of the major movements during the Biden administration, Trump has been somewhat critical of it, but

1:28:18

it's unlikely That the Republican Congress and Trump will repeal that act.

1:28:24

They might kind of revise it, but it does seem more likely that that will stay in place and continue being a factor.

1:28:31

So that's, I guess, this article's summary and our best guess at the implications of a Trump presidency for AI.

1:28:40

We will have to wait and see what happens in practice.

1:28:44

And speaking of evading AI sanctions from the U.

1:28:48

S., the next article is a fab whack a mole Chinese companies are evading a U.

1:28:54

S. sanctions and this is a bit of an overview, I suppose, so it's talking about the need for AI competitiveness, covering the sanctions, and talking about Talking about how

1:29:08

companies such as Huawei are exploiting various loopholes to acquire advanced semi conductor manufacturing equipment, which is then enabling them to build large AI clusters.

1:29:21

So again, Jeremy, I'll let you take over on this one since this is your real

1:29:26

Jeremie: house. Oh, yeah. Well, I thought that's okay.

1:29:28

So, so I will always shill, uh, semi analysis any chance I get, uh, semi analysis is an amazing newsletter.

1:29:34

If you're into AI hardware stuff, hardware stuff, I should say in general, uh, go check them out.

1:29:39

The blog posts are really technical.

1:29:41

So unless you. Kind of know the hardware space tough to tough to justify a subscription if you're not getting all the value out.

1:29:50

But, um, they are, if you're in that space, I mean, you're probably already subscribed.

1:29:54

These guys are amazing. Um, so this is, yeah, a report on the.

1:29:59

Uh, really, uh, difficult enforcement challenges that are facing the Department of Commerce and BIS as they look to, to enforce their export controls on AI chips.

1:30:10

But, um, I just want to give you, uh, an excerpt from this report.

1:30:14

Uh, they're talking about, um, SMIC, which is.

1:30:18

China's answer to TSMC, obviously.

1:30:20

Uh, so they produce all of China's leading, uh, leading nodes on the hardware side.

1:30:24

So they say sanctions violations are egregious.

1:30:27

SMIC produces seven nanometer class chips, including the Kirin 9000S mobile, uh, SOC system on a chip and Ascend 910B AI accelerator.

1:30:36

Two of their fabs. Okay, two of their fabs are connected via a wafer bridge.

1:30:40

Okay, so wafer is the thing that you, it's like this big circular thing that, that is made of silica and silicon.

1:30:48

And you, that's what you etch your, your circuits in.

1:30:51

And anyway, this is the starting point for your fab process.

1:30:54

Um, so two of their fabs are connected via wafer bridge such that an automated overhead track can move wafers between them.

1:31:01

But for production purposes, this forms a continuous clean room and effectively one fab.

1:31:06

But for regulatory purposes, they're separate.

1:31:10

One building is entity listed by the U.

1:31:12

S. In other words, one building is owned by an entity that's on a blacklist.

1:31:17

You're not allowed to sell advanced AI logic to them.

1:31:19

Um, and because of national security concerns, whereas the other one is free to import these like, you know, Dual use tools and it claims to only run

1:31:28

legacy processes and yet they're connected by this physical fucking bridge.

1:31:33

Like this is how insane it is. You basically have one facility and we're just going to trust China and SMIC that they're not sending something like a way for right when it should be going left type of thing.

1:31:45

That's the level things are on. They go into detail on stuff that we've been tracking for a long time.

1:31:50

So, um, there is a. A fab network that is being run and orchestrated by Huawei, where they spin up new, um, subsidiaries basically as fast as they can to evade us

1:32:01

export controls right now, us export controls work on a blacklist basis.

1:32:05

So you basically say. Okay. Okay. Okay, we're gonna name new entities and organizations.

1:32:09

You are not allowed to sell advanced semiconductor manufacturing equipment to, um, and we try to keep that list fresh.

1:32:15

Well, Huawei is just gonna kind of create new, spawn new entities as fast as they need to.

1:32:20

And they have this vast network now, um, that is basically moving Huawei into the center of what you might think of as China's.

1:32:27

Maybe AI ambitions, like if you start to think about what is the, um, not even the open AI of China, but what is, what is the coordinating entity for a lot of China's big scale?

1:32:35

The I work, it is increasingly Huawei, both on hardware and software.

1:32:38

So, uh, there are all these, uh, these pushes to, to get Huawei looked at and, and all this and, and what the, um, this report argues for,

1:32:47

and I think is quite sensical is, uh, you need to start to think about.

1:32:51

tightening in a broader way your, uh, your export control requirements.

1:32:55

So instead of just saying, Oh, look, we've got a blacklist and we're going to try to keep that blacklist fresh.

1:33:00

Um, instead using, let's say a wider range of tools to require any material that is at all us Uh, fabricated in the whole supply chain that that can't be shipped.

1:33:11

So, so even if you're at ASML, you're building something that has any component of us technology in it.

1:33:16

If you ship that to, uh, to China, that's a no, no, like these broader tools are becoming necessary just because otherwise you're playing this whack a mole game that you're destined to lose.

1:33:26

And at this point, the stakes are just, just way, way too high.

1:33:28

So, um, by the way, I say this.

1:33:31

You know, semi analysis is they are AI accelerationist, uh, in their bones, right?

1:33:35

This is like, they are not kind of AI safety pilled, uh, as far as I can tell, it's quite the opposite.

1:33:40

And here they are saying, no, no, like we need to like fucking ban, uh, the export of this hardware to China in a very robust and unprecedented way.

1:33:48

I think this makes all the sense in the world. If you believe this is ultimately dual use technology, then that's what you got to do.

1:33:54

Like we can't be updating blacklists every 20 minutes.

1:33:57

Andrey: And just a couple more stories. The next one is, uh, they're much related to that previous one, actually an example of sanction violations.

1:34:06

So the story is that the U. S. has fined the company GlobalFoundries for shipping chips to a sanctioned Chinese firm.

1:34:16

So this is, uh, 500, 000 penalty on this New York based company, Global Foundries.

1:34:22

It's the world third largest contract, uh, chip maker, and it has shipped chips without authorization to an affiliate of, uh, SMIC, the Chinese, uh, chip maker.

1:34:37

And this was 74 shipments of 17.

1:34:41

1 million worth of chips to this company SJ Semiconductor, which is affiliated with SMIC.

1:34:50

Interestingly, this also says that Global foundries voluntarily disclosed this violation and cooperated with the Commerce Department.

1:35:02

And there was a statement from the Assistant Secretary for Export Enforcement, Matthew Axelrod, that says, we want U.

1:35:10

S. companies to be hyper vigilant when sending semiconductor materials to Chinese parties.

1:35:16

And Jeremie: the global foundries came out and said they regret quote, the inadvertent action due to a data entry error made prior to the entity listing.

1:35:24

So a data entry error blamed for this look, probably true.

1:35:28

Uh, and this is the stuff is really difficult to enforce, especially when you have A very complex set of layered kind of requirements and all this stuff.

1:35:37

Like, you know, the, the rules right now are not simple.

1:35:39

And that, that is a challenge for enforcement.

1:35:42

Um, you know, so, so maybe no surprise to see this is yet another, another kind of leaky situation.

1:35:47

Obviously TSMC had, had similar issues recently, right?

1:35:51

They accidentally sell, sold some stuff to, uh, uh, to, to a Huawei affiliate.

1:35:55

But this is just what happens. It's part of the reason why you just need stronger incentives, right?

1:35:59

Companies like global foundries. are running processes that are subject to these kinds of errors, then that just implies, okay, they need to try harder.

1:36:09

The incentives just need to be stronger. Um, you know, to, to kind of bump back to that semi analysis report that we were talking about earlier, uh, one of the, the call outs that they make is, you know,

1:36:18

the, the, um, industry side of this has been claiming that this would wreck.

1:36:22

Uh, you know, tighter export controls would wreck industry and blah, blah, blah.

1:36:26

And, and they've actually been doing better, not worse, um, uh, including, you know, decent sales to the Chinese market, um, uh, in the last few years, this has

1:36:33

been an absolute boom time for them in spite of increasingly tight export control.

1:36:38

So the economic argument may be faltering a little bit here.

1:36:41

Um, but, uh, but yeah, we're seeing in real time, these holes kind of appear and yes, you know, Get plugged like this, this will get plugged and then there are going to be new holes.

1:36:49

It's this, yeah, never ending game of whack a mole again to, uh, plagiarize the semi analysis, uh, post title.

1:36:55

Andrey: And last up, the story is that on fropik has teamed up with a Palantir and NWS to sell its AI to defense customers.

1:37:06

Uh, quite related to the story last week, we had with Meta, uh, altering their license, their user agreement to let defense in the U.

1:37:15

S. use it. Uh, now the, this collaboration would allow CLAWD, the chatbot from Entropiq, uh, to be used within Palantir's defense accredited environment, this Palantir Impact Level 6, I

1:37:29

don't know what this is, reserved for systems containing data critical to national security.

1:37:35

So Entropiq previously has, uh, you know, I guess, prevented use of onetropic or at least precluded it in their agreements for U.

1:37:45

S. defense customers. And, uh, per this article and per what we discussed last week seems to be part of a general trend.

1:37:54

Jeremie: Yep. Anthropic I have heard has been really transparent, um, internally with their own, uh, their own teams about this and, and the deliberative process behind it.

1:38:03

I mean, I actually think, you know, this is, you, you want, um, an AI safety focused org to be working with the U S government to have

1:38:10

them understand what's going on, including in defense contexts.

1:38:14

Uh, and this is going to be for, yeah, in intelligence analysis, that sort of thing.

1:38:17

Um, so yeah, I mean, like, I actually think they're going to face a lot of flack for this.

1:38:21

I think this is a, a good move. Um, and, um, and the Palantir partnership is, is actually going to be really important for them too.

1:38:28

Cause you know, selling into DOD is hard. Uh, you want, you want to work with someone who really understands that process.

1:38:34

So, uh, yep. This is, uh, another, another big boon for, uh, anthropic potentially because that market is also just, it's really big and it's what, you know,

1:38:41

You know, anthropic needs to do to understand both their, their customer.

1:38:45

They're a really big potential customer. Well, and also for their own mission, they need to be able to integrate, integrate, integrate tightly with the U S government, with

1:38:53

the national security parts of the U S government and all that stuff.

1:38:55

So, uh, yeah, we'll, we'll see where this goes. And if we end up seeing more reporting about this, uh, this deal.

1:39:00

Andrey: Yeah. And then speaking of the government, this news also covers that, uh, cloud has common to AWS gov cloud, which is a service designed for us.

1:39:11

government cloud workloads. Wasn't aware there was a gov cloud, but that's neat.

1:39:16

So seemingly it's not just for military.

1:39:18

It's also just in general for use within the U.

1:39:23

S. government. And that will be it for this episode of Last Week in AI.

1:39:28

Once again, you can go to the episode description for links to all the stories.

1:39:33

Also go to lastweekin.

1:39:35

ai for those links and for the text newsletter.

1:39:39

We always appreciate your comments, your views, your tweets, your comments, all of those things.

1:39:46

But more than anything, we do appreciate you listening.

1:39:48

So please keep tuning in and hopefully enjoy this AI song.

1:39:54

That is not terrible.

Rate

Join Podchaser to...

  • Rate podcasts and episodes
  • Follow podcasts and creators
  • Create podcast and episode lists
  • & much more

Episode Tags

Do you host or manage this podcast?
Claim and edit this page to your liking.
,

Unlock more with Podchaser Pro

  • Audience Insights
  • Contact Information
  • Demographics
  • Charts
  • Sponsor History
  • and More!
Pro Features