How To Train Your Very Own AI-Enabled Slackbot by Heavy Networking | Podchaser

Episode from the podcastHeavy Networking

How To Train Your Very Own AI-Enabled Slackbot

Released Friday, 4th April 2025

Good episode? Give it some love!

How To Train Your Very Own AI-Enabled Slackbot

How To Train Your Very Own AI-Enabled Slackbot

Friday, 4th April 2025

Good episode? Give it some love!

Rate Episode

Podchaser Pro

Episode Transcript

Transcripts are displayed as originally observed. Some content, including advertisements may have changed.

Use Ctrl + F to search

0:00

Your network doesn't operate in a

0:02

vacuum. Every change you make has

0:04

a direct business impact. So why

0:06

make changes quietly in your silo?

0:08

Orchestrate your network automations

0:11

to integrate with the rest

0:13

of the business using ITential.

0:15

Visit ITential.com to find out

0:17

more. That's ITential.com. On

0:19

today's heavy networking, we will discuss

0:22

building a slack-bot wired to

0:24

an AI and trained on

0:26

your own organization's knowledge. The

0:28

potential use cases for network

0:30

operations are fascinating, and indeed,

0:32

we know of companies like

0:34

Selector.a that are training models

0:36

on real-time network infrastructure telemetry,

0:38

changing how we manage our networks. I'm

0:41

Ethan Banks, along with Drew, Conry Murray,

0:43

follow us on LinkedIn, Blue Sky, and

0:45

the Packet Pushes Community Slack Community. Our

0:47

guest is Kuyler Middleton. She's the co-host

0:49

of the Day to Devops podcast with

0:51

Ned Belavans. So if her voice sounds

0:54

familiar, that might be why. Kuyler's been

0:56

publishing a detailed, instructive series on her

0:58

Let's Do Devops sub-stack about her AI-enabled

1:00

Slack bot. And she definitely draws the

1:02

rest of the owl. And she was

1:04

guided enough to share even more time

1:07

with the community to record with the

1:09

community to record with us about what

1:11

she's built, lessons learned, lessons learned, and

1:13

suggestions for what the rest of what

1:15

the rest of us might be able

1:17

to create inspired by her project. So,

1:19

Kyler, welcome to Heavy Networking. And so

1:22

you're these articles that you've posted in

1:24

your in your sub stack. Some of

1:26

your sub stack is paid and some

1:28

of its not. What is the status of

1:30

these articles? Yeah, this is an ongoing series.

1:32

I sort of have these ideas of like,

1:35

could I build that? And then in a

1:37

week later in a, I come out of

1:39

a caffeine haze and I think, oh, okay,

1:41

I could, I've ignored my job for a

1:43

week, but I built it. So the. leading

1:45

maybe two or three articles are paid and

1:48

the rest of it's free. The status of

1:50

this series is 22,000 words so far and

1:52

the first two in the series of it's

1:54

going to be maybe eight because I want

1:56

to stream the tokens back to slack. We'll

1:58

get into that just. like Chad, I'm

2:00

building Chad's EBT basically, but for your

2:03

private enterprise, we'll be like Article 8

2:05

or something like that in this series.

2:07

Okay, I'd read the first four, you

2:10

just published part five, which gets into

2:12

some of the rag and augmentation stuff,

2:14

and there's gonna be three more after

2:16

that, at least you're saying, goodness. Yep,

2:19

absolutely. I want to do interstitial little

2:21

posts that said, hey, I'm talking to

2:23

the, I'm talking to the knowledge base.

2:26

Okay, now I'm chatting with AI, and

2:28

then start streaming it back to slack.

2:30

So it's not just a post, and

2:32

then 10 seconds later you get an

2:35

answer. So it's not that you drew

2:37

the owl, you drew the owl in

2:39

a tree in a forest with a

2:41

lovely lake nearby. Exactly. The owl has

2:44

an ecosystem that lives in. I really,

2:46

this is all intentional and it seems

2:48

ridiculous, but I want to make this

2:51

is all intentional and it seems ridiculous,

2:53

but I want to make this very

2:55

accessible. And the way that I can

2:57

make the most accessible is to include

3:00

the whole owl. And I want you

3:02

to be able to follow along. All

3:04

this code is published, like all this

3:07

code is published, like, But all the

3:09

code, it's MIT Open Source. If you

3:11

want to go steal it, if you

3:13

want to go sell it, that's fine.

3:16

It's MIT Open Source. Go do it.

3:18

It's on GitHub. But really, I would

3:20

love for you to implement this in

3:23

your own enterprise, because it's useful. And

3:25

it can do useful stuff. And I

3:27

don't want anyone to be excluded, because

3:29

they don't know what a lambda is,

3:32

or they don't know how to write

3:34

Python code. I can do that part.

3:36

I have provided. all that complex really

3:39

it's it's a lot of knowing the

3:41

right libraries knowing how to construct some

3:43

of the right statements but it's not

3:45

like thousands of lines of complex Python

3:48

not not at all it's under a

3:50

thousand lines of Python it's using some

3:52

external stuff like the Bata 3 library

3:54

from AWS but it's not even a

3:57

thousand lines of code so and you

3:59

can ask an AI to explain all

4:01

the steps so yeah okay so we

4:04

should zoom out let's pretend where the

4:06

AI And if you've got to explain

4:08

to someone at a high level what

4:10

you've built here, I mean, there's a

4:13

slack bot, it's tied to an AI

4:15

model. Tell us what this thing is.

4:17

Totally. The elevator pitch of the three

4:20

words of like, you know, the Facebook

4:22

for ice cream is, this is a

4:24

chat TPT for your private enterprise. So

4:26

I'm in a regulated industry, that's my

4:29

primary job is at Viridheim as a

4:31

software engineer, and we're in health care.

4:33

And we have, you know, a lot

4:36

of health care data. And that means

4:38

that we need to be really cautious

4:40

with what we let people upload and

4:42

where our data goes, because our CISO

4:45

could like go to prison if we

4:47

do a bad enough job of this.

4:49

Gen AI is really powerful, right? It

4:52

hallucinates and does nonsense things, but occasionally

4:54

is brilliant and it's really helpful for

4:56

writing code and doing, you know, all

4:58

sorts of stuff. Tell me a poem

5:01

about a pirate. Tell me about a

5:03

kitty. I've been giving my three-year-old stories

5:05

from Chad GPT the past few nights.

5:07

And, uh, It's so useful, but it's

5:10

so excluded for like regulated industries because

5:12

all your data is being collected and

5:14

trained on the Facebook model, the Google

5:17

model. It's free because you're the product.

5:19

And so these companies like, you just

5:21

shouldn't, you can't. you can, but you

5:23

shouldn't be using chatGPT or Google's public

5:26

models or Deep Seek, probably. So what

5:28

I wanted to do is bring that

5:30

power to industries that are excluded because

5:33

of their privacy, like governments and finance

5:35

and health care, and be able to

5:37

use it privately. So I'm using it

5:39

and I'm pitching it internally as like

5:42

you can have it analyze contracts You

5:44

can have it read resumes and give

5:46

you information and like you should never

5:49

do those things with public AI But

5:51

you can do it with private AI

5:53

safely and that's that's pretty cool. That's

5:55

the goal So give us some use

5:58

cases that might be interesting for infrastructure

6:00

engineers people that like for this audience

6:02

folks that manage network infrastructure. I have

6:05

been staggered at the amount of what

6:07

you would probably consider like expert level

6:09

expertise at writing Splunk queries. Like if

6:11

you don't understand what KQL is or

6:14

how to write a Splunk query in

6:16

it, you can have this tool do

6:18

it and it does an excellent job.

6:20

And This gets on to where I'm

6:23

building towards, but I've had it read

6:25

our entire confluence because that's supported by

6:27

the AWS bedrock data source toolkit. It's

6:30

beta, but it works. And we have

6:32

a bunch of guides on how to

6:34

write Splunk or how to write terraform

6:36

in our standards. And so now this

6:39

model spits out perfectly formatted Splunk queries

6:41

or terraform or ACL updates or tells

6:43

you exactly how to apply for an

6:46

exception to our manual change policy. with

6:48

our corporate standards immediately, and you don't

6:50

even have to talk to a human,

6:52

which some engineers really appreciate. I do

6:55

some days, too. So can you walk

6:57

through sort of the high-level big pieces

6:59

of the system that you've put together?

7:02

Yeah, absolutely. So this uses the Bolt

7:04

framework from Slack, which sounds scary, but

7:06

really it's just a little Python library

7:08

that you can use. So Slack is

7:11

the interface where I put my query

7:13

to start this whole mechanism running? Absolutely.

7:15

Yeah, let's start from there. So you

7:18

go into Slack and I have a,

7:20

excuse me, a Slack app that's registered

7:22

in Slack, and you can either direct

7:24

message it or tag it into a

7:27

shared... room and I figured that's the

7:29

best place to start. I could build

7:31

a web page or something, but everyone's

7:33

in Slack or everyone's in teams, which

7:36

I'm going to build in the future.

7:38

And when you message this bot, which

7:40

I call Vera, which is the Latin

7:43

word for truth, we'll see if AI

7:45

can stick to that, she'll do her

7:47

best. And it sends a web hookout

7:49

to a lambda function URL that spins

7:52

up a lambda that's written in Python.

7:54

It's in lambda, you know, just a

7:56

server, just a server. running somewhere is

7:59

patching is terrible and eventually your server

8:01

has to reboot and then it breaks

8:03

your thing and Lambda doesn't ever reboot.

8:05

And I love that. Lamb is a

8:08

serverless service from AWS, yes. Yeah, absolutely.

8:10

Okay, so that means you're not running

8:12

infrastructure to support this because you're using

8:15

tools. At all. It's Python 312 which

8:17

is supported until 2028. There's no underlying

8:19

operating system that I have to patch

8:21

or reboot or monitor or anything. It

8:24

just spins up and processes a conversation

8:26

and then spins down. There's also the

8:28

side benefit that it can scale out

8:31

almost indefinitely. So if I want to

8:33

have 10,000 conversations at once, I could,

8:35

I'm never going to get there with

8:37

this product, but it's possible. And the

8:40

bill would probably be staggering. It would

8:42

be. Well, it's kind of surprising because

8:44

I have numbers for the costing and

8:46

it's almost nothing. I've processed about 2,000

8:49

conversations so far and it's cost about

8:51

12 bucks. So it's really. Comparatively, let's

8:53

talk about that later because it's I

8:56

have so much about that. So the

8:58

lambda gets the conversation, it reads the

9:00

entire black thread using the. Bolt's API

9:02

endpoints for slack and constructs the conversation

9:05

that it sends over to the bedrock

9:07

APIs which bedrock is an AWS AI

9:09

endpoint system We're using all of the

9:12

it's called serverless on their side, which

9:14

means you don't have to have an

9:16

AI model provisioned Starting cost $30,000 which

9:18

I'm not quite ready for for this,

9:21

you know, homegrown lab thing and There's

9:23

a little bit more safety and security

9:25

built in on the bedrock side, but

9:28

I'll skip all that stuff and Bedrock

9:30

lets you pick which models are going

9:32

to be sort of the base of

9:34

this. Yeah, absolutely. I'm using Anthropics, Claude

9:37

Sonnet, 3, but you can pick whatever

9:39

you would like. Whatever is in Bedrock,

9:41

that is, but they've got a huge

9:44

selection. Exactly. Yeah, you can import your

9:46

own, but again, when you import a

9:48

model or train a model, they run

9:50

it for you and base cost around

9:53

30 grand a round 30 grand a

9:55

month. So unless your product is built

9:57

around this, it's just out of reach

9:59

for everyone. So as long as it's

10:02

available in their serverless library, which is

10:04

a ton of stuff you've heard of,

10:06

the open AI models are available, clouds

10:09

models, Google's Gemini models are all there,

10:11

and they're charged based on tokens, and

10:13

it's something like a million tokens for

10:15

five bucks. And most of these conversations

10:18

use about 500 tokens. So the math

10:20

of that is staggering. It's almost nothing.

10:22

Especially compared with... enterprise products that serve

10:25

this where they're seated and you have

10:27

to pay for, you have 50 users

10:29

so you have to pay $10 a

10:31

month for each user. That's $500 a

10:34

month and this is going to serve

10:36

that same need for like maybe the

10:38

cost of two or three Starbucks a

10:41

month. It's really a huge difference. The

10:43

Slackbot itself, I had built a Slackbot

10:45

before Slack retooled how you build an

10:47

application and Slack, this goes back a

10:50

few years, but you had to have

10:52

some code that was basically sitting there

10:54

listening there listening there listening there listening.

10:57

do the slack channel and reacting. Do

10:59

you, is there still a piece like

11:01

that or is the new way you

11:03

do slack bots these days? Do you,

11:06

do slack kind of do that for

11:08

you or? That's how I develop it

11:10

locally is I run the Python thing

11:12

and it starts the slack listener that

11:15

I connect to via like an in-grock

11:17

endpoint, which is an open source tool

11:19

that lets you receive public web hooks

11:22

and send them to your local listener.

11:24

But for the real production one that's

11:26

running in Lambda, it receives, it's called

11:28

a function URL, which means if it

11:31

receives a connection, like that's the listener

11:33

as the Lambda infrastructure from AWS, it

11:35

spins up your Lambda in about a

11:38

quarter of a second and starts processing

11:40

the event, like right away. So it's

11:42

not sitting there charging you money, it

11:44

just is ready to run your Lambda.

11:47

And the trigger is to spin that

11:49

up, when an input is received from

11:51

the slack command line and goes into

11:54

slack, there's a web hook that fires,

11:56

reaches into Lambda, because I've read this

11:58

part of your post, then the Lambda

12:00

instance spins up and begins process. Yeah,

12:03

and it uses I am permission. So

12:05

there's almost no static keys. I static

12:07

keys are the worst. We know that

12:10

as infrastructure engineers. So it's all I

12:12

am dynamic stuff. Everything on the AWS

12:14

side is keyless authentication. They're the worst

12:16

except for their incredible convenience. Oh, they're

12:19

so convenient. And they don't require fetching

12:21

anything. But I am such a bear

12:23

to learn at first. But now that

12:25

I've got it. I'm starting to like

12:28

it only took 10 years. I'm starting

12:30

to like it. Okay, bedrock. We've talked

12:32

about this a lot and bedrock is

12:35

I understand it here. Correct me if

12:37

I'm wrong. That's the service that provides

12:39

the AI model. And AWS is notorious

12:41

for having many services. So what's. bedrock

12:44

specifically and why they just selected over

12:46

any other AI related options that AWS

12:48

might offer. Well, I actually kind of

12:51

didn't. I started building this with Lambda

12:53

because I built a project with that

12:55

previously and I like it. And I

12:57

was going to send everything over to

13:00

Azure AI because I don't know. That's

13:02

what my company's kind of standardized on.

13:04

And I really like how they've done

13:07

their AI models. It's the same as

13:09

bedrock, but it's hosted in the Azure

13:11

cloud from Microsoft. And then one of

13:13

our architect said, well, well, Send all

13:16

the traffic to AWS so you can

13:18

send all the traffic to Azure. That

13:20

doesn't make any sense. And I thought

13:23

I can either update this to the

13:25

bedrock AI endpoints from AWS or I

13:27

can rewrite this lambda as a function

13:29

URL and serverless is so different between

13:32

clouds. And so is authentication and so

13:34

is all the standards they use for

13:36

just how things run at work. And

13:38

I thought I would much rather learn

13:41

bedrock than learn. how function URLs actually

13:43

work in Azure. So I said, we're

13:45

going to AWS. So is bedrock the

13:48

only AI related service that I would

13:50

be considering? There's so many services in

13:52

the U.S. I don't keep up. Is

13:54

bedrock yet? There's so many services and

13:57

I don't understand all of them. There

13:59

are some other machine learning services that

14:01

hand service needs like this, but for

14:04

smaller projects like this, where you're processing

14:06

an input, especially this conversational gen AI

14:08

type AI, bedrock is probably where you're

14:10

working at. They're starting to put all

14:13

of their serverless models and their simple

14:15

guardrails that sort of monitors for like.

14:17

inappropriate content in and out of models.

14:20

That's all in bedrock. So yeah, that's

14:22

probably where you're starting. The examples are

14:24

really good. Some of them are hidden

14:26

away in GitHub example repos that are

14:29

a little hard to find, which means

14:31

you can read from Kyler's blog, let's

14:33

do devops, and how to do it.

14:36

And like that has all the pictures

14:38

and stuff. This changes so much, especially

14:40

behind the scenes, but some of the

14:42

like front end changes too that it's

14:45

hard to follow along AWS docs in

14:47

any real way. and have them make

14:49

sense because it's changing, you know, like

14:51

the whole field is changing. It's not

14:54

AWS's fault. This is just, it's a

14:56

moving target for technology. So I think

14:58

I heard you say there are, you

15:01

could also access publicly available available models

15:03

in Azure, but you didn't want to

15:05

learn Azure serverless functions, so you just

15:07

stuck with Lambda. So does that mean

15:10

if I'm in Azure or I'm in

15:12

GCP, they also have similar services to

15:14

bedrock? Yeah, absolutely. I haven't done a

15:17

lot with GCP, but Azure definitely does.

15:19

I like their implementation a little bit

15:21

more than bedrock. In the AWS side,

15:23

if you want to use a guardrail,

15:26

which is sort of monitoring for inappropriate

15:28

input and output, you specify it in

15:30

your code, strangely. So like, if someone

15:33

wanted to bypass it, they could comment

15:35

it out. And then you no longer

15:37

have any guard rails. And that's a

15:39

strange choice. Azure you deploy a model

15:42

to an end point and when you

15:44

do so you specify a guardrail and

15:46

it's just implicitly invoked whenever you talk

15:49

to the model so there's no you

15:51

know bull flag that you pass them

15:53

that says hey don't check this for

15:55

safety this time you just have to

15:58

use it which I've I much prefer

16:00

that standard. Oh boy, I bet we're

16:02

going to be hearing about that someday.

16:04

Yep, someone will have forgotten it

16:07

or their code will like turn it off

16:09

on accident and then it just goes

16:11

nuts because they do. And I've actually

16:13

had the AI go crazy a couple

16:15

of times with some of my bad

16:17

code, which is really fun to watch

16:19

your software project go absolutely loony. It's

16:21

been a ton of fun. But the

16:23

general rule, if you are paying for

16:25

the AI per like... tokens, especially

16:27

these cloud platforms, your privacy

16:29

will be respected. And I've

16:31

seen, I have read good things

16:34

from GCP and Azure and AWS

16:36

for their AI services. But if

16:38

you're paying something like an open

16:40

AI, like these other sort of

16:42

public AI platforms, they don't have

16:44

a proven track record where they're

16:46

respecting your privacy. So especially for

16:48

like regulated industries in finance and

16:50

healthcare, I would be very cautious

16:52

using those. But for these hyperscaler

16:54

platforms. I feel much more comfortable.

16:56

They have access to so much

16:58

of our data already in S3

17:00

buckets and databases and servers, but it

17:03

just doesn't make a dent. And the risk

17:05

that we're exposing ourselves to, to, you know,

17:07

have our AI or two. We're already screwed,

17:09

so you might as well keep going. Kind

17:12

of, yeah, all our eggs are in the

17:14

basket. Let's put another egg on top. Side

17:16

note here, I'm curious, did you... ask anybody

17:19

at work in a regulatory

17:21

position or compliance position about

17:23

this project before you started

17:26

or you're just going for

17:28

it? I did eventually ask for

17:30

permission, but I wanted to see

17:32

if it would work first. And

17:34

that's really a bad standard. But

17:36

I started with data that we

17:38

don't really care. if people read. So

17:41

our confluence is like the wiki from

17:43

Atlacian. And that data, that has historically

17:45

just been open to everyone. Everyone can

17:47

read, everyone can write for almost everything.

17:49

So I thought, if this just goes

17:51

crazy and it starts to spit out random

17:53

facts to people, it doesn't really matter

17:55

because everyone has access to this data

17:58

already, it's just provided in slash. instead

18:00

of a web browser. What I want to get this

18:02

to eventually is the sort of transitive

18:04

security model where if, you know, Ethan's

18:07

permissions are different than Drew's permissions, when

18:09

you talk to the model that AI

18:11

can access a different tier of data

18:13

or different sources of data and your

18:16

sort of permission schema gets transitively applied

18:18

to what the model can do. That's

18:20

well beyond what. models can do, like

18:22

state of the art wise, I might be

18:24

able to kind of hack it together with

18:27

different knowledge bases and data sources and sort

18:29

of turn them on or off for different

18:31

people, but that's really a hacky solution to

18:33

this problem that would be better served with

18:35

something much more elegant. So that's... And I

18:37

can just ask the AI to pretend I'm

18:39

Ethan and then give me Ethan's...

18:42

Absolutely, absolutely, yeah. Aren't

18:44

you devious? Wow, right there.

18:46

Thinking two steps ahead, man.

18:48

A quick sponsor message from

18:51

the network orchestration folks

18:53

at ITential. Automating

18:55

network configuration change is

18:57

a major milestone for a netops team.

18:59

Well, what is next? Orchestrating the entire

19:02

workflow. Because network changes don't begin when

19:04

you kick off a script. Network changes

19:06

begin with a business process, such as

19:09

a ticket coming in, and then you

19:11

have a change to meet the needs

19:13

of that ticket proposed. Testing is

19:15

then performed to make sure the change

19:18

will do the right things, and then

19:20

a human approves, and the changes performed,

19:22

and post-deployment testing is done, and notifications

19:24

are sent out, and the ticket is

19:26

updated, and your process no doubt varies,

19:28

right? But you get the idea. Now, what

19:30

if you could take that entire

19:33

workflow and orchestrate it so that

19:35

your manual interaction with the ticketing

19:38

system and with service now and

19:40

so on, if that's all handled

19:42

by an automated workflow that you

19:45

designed to work with the specific needs

19:47

of your shop, that would make

19:49

you more efficient, right? It would

19:51

increase the likelihood of not only

19:54

the change getting done, but the

19:56

entire business workflow being

19:58

completed without error. You

20:00

get a complete set of tooling that

20:02

gives you all the power you could

20:04

need to have your company's IT and

20:07

business platforms interacting smoothly. When your

20:09

network operations are ready to evolve

20:11

to robust orchestration of your network

20:13

changes, ITential should be on your

20:16

short list of platforms to evaluate.

20:18

To find out more about ITential's

20:20

products and how they help

20:22

you orchestrate network automation workflows,

20:25

visit ITential.com. That's itential.com and

20:27

tell them packet pushers sent

20:29

you. Kylie, you mentioned Python

20:31

3.12. Is that critical for the

20:33

functioning of this? Or could I

20:36

get away with something older? I

20:38

mean, lots of default Python installations

20:40

for OSAs are somewhat older than

20:42

312. You could totally do something older.

20:45

I picked 312 because it was the

20:47

most recently supported Python 3 for

20:49

Lambda in AWS that I would

20:51

not have to patch. Because just...

20:53

Engineers are really lazy and I

20:56

fit this model very very well. I

20:58

don't want to do this again. And

21:00

so by 2028, you know, maybe I'll

21:02

have another job. Maybe I'll have moved

21:04

on to a different thing. Someone

21:06

else will solve that problem. I've

21:09

put it so far out in the future.

21:11

And that's the goal. So use as

21:13

new as you can because then you, you

21:15

know, it's probably someone else's

21:17

problem when it breaks in a couple years

21:20

when it goes into life. in the long

21:22

run, does it matter? Or what kind of

21:24

things should I be thinking about when I'm

21:27

going through with a vast menu of

21:29

models to pick from? There's so

21:31

much to this answer, but let's start with

21:33

just testing models. So this is built,

21:35

but the one that I built is

21:37

built on Anthropic Clyde 3, Sonnet, which

21:40

is the latest model from the Anthropic

21:42

company, which is one of those sort

21:44

of big companies building AI models. And.

21:46

you can absolutely test these models on

21:49

the bedrock platform. So within the console,

21:51

you can do side by sides with

21:53

I think up to three different models and

21:55

ask them the same questions or ask the

21:57

same model with different parameters. like

22:00

their temperature and their top pee

22:02

of, you know, generate an answer

22:04

to this question. And you can

22:06

sort of measure how they do.

22:08

And so that's what I did first

22:11

is like do some big models. I

22:13

did open AIs, I think,

22:15

O3 and Anthropic Claude and

22:17

Gemini and Titan from AWS

22:19

and just. see how they do.

22:21

And Anthropic was quite a bit

22:24

better at understanding programming, which I

22:26

built this first of all to

22:28

be like a programming assistant for

22:30

our software engineers and our SRE

22:32

team. And so that was kind

22:34

of an easy choice for us

22:36

to do. But we're using this

22:39

particular API from AWS called the

22:41

Converse. API and that's a fancy

22:43

word for it's sort of a meta

22:45

API where it has a standard interface

22:47

no matter what model you use because

22:49

all these models they're built a little

22:51

different their APIs are different for how

22:53

they expect data and the formatting

22:56

of documents etc so the converse API

22:58

standardized is that it's one API call and

23:00

it can talk to any model on the

23:03

back end. They sort of reformat your

23:05

API call and pass it to the

23:07

model, which is really cool in terms

23:09

of like it has some support for

23:11

document types like you can pass it

23:13

a spreadsheet and it'll understand it where

23:15

the models might not. But the side benefit

23:17

of that is I can. flip over

23:19

to a different model in about five

23:21

minutes. I don't have to reformat how

23:23

I'm constructing all those API calls. You just

23:25

specify a different name of model and converse

23:27

will convert it for that and send it

23:29

over. So big fan of that, it has

23:32

worked really well. And if we ever decide

23:34

that, you know, that four-o from open AIs

23:36

looking really cool, we can probably just

23:38

test it out by changing to a

23:40

different name when you're declaring which model

23:42

you're declaring which model you want

23:44

to talk to talk to. accuracy of response,

23:47

speed of response, were there other things you

23:49

were looking for? No, that's that's kind of

23:51

it and that's all very gut feelingy.

23:53

It's very unscientific at this point because

23:56

this is very much a lab project I

23:58

just built by myself for real in

24:00

AI engineers that are building stuff

24:02

that handles like health care

24:04

data and other like user

24:06

facing stuff, there's testing suites

24:08

where you pass in, you know, 500

24:11

different tests and you. analyze the

24:13

responses generally with another AI model,

24:15

which is kind of funny, AI

24:17

judging other AI responses, and you

24:19

score them and you can tell

24:22

in this really scientific methodology whether

24:24

it's better or worse to go

24:26

to a different model and like

24:28

how does it handle typical questions

24:30

that we have. But this is kilerware

24:33

of I just do it and I'm

24:35

like, oh yeah, that seems like a

24:37

better answer to me. Let's use that

24:39

one. In the future, we'll probably do

24:41

both automated measuring of sort of a

24:43

formal methodology of what's better, worse

24:46

for different models and standards and

24:48

parameters, but also something called data

24:50

grounding, where you can give the

24:52

correct answers to binary questions. So

24:54

like, what color is a stop

24:57

sign? It's red and white. And

24:59

so you can have it measure

25:01

whether that answer is accurate. And

25:03

you can provide it like hundreds

25:06

or thousands of questions where it

25:08

has to get the answers right.

25:10

And those responses can be measured in

25:12

real time. That's a new thing in

25:14

bedrock. I don't have that turned on

25:17

yet, but I want to. I just

25:19

need to write some binary that those

25:21

sort of have a real answer type

25:23

questions, not a gut feeling

25:26

style. And it'll be able to measure

25:28

those. responses from the model, whether they're

25:30

factually correct. So it's a different AI

25:32

that spends up in real time and

25:34

measures the response back to the user

25:36

and says, like, oh, this is accurate

25:38

enough. It passes my threshold of, I'm

25:40

going to let it go back, versus

25:42

this is total nonsense. This disagrees with

25:44

the things I know are true. I'm

25:46

going to block it and send an

25:48

error message instead. That's much more useful

25:50

for user facing stuff that's thousands of

25:52

responses a day, but I'm learning how

25:54

it works so I can do that

25:56

cool stuff one day. Yeah, so you're saying there

25:58

are like rigorous methods. for testing, but

26:01

this is a lab project, so vibes

26:03

suffice. It's vibes, exactly. The vibes are

26:05

good, so we're building the thing. And

26:07

I'm kind of just bolting these on.

26:09

This is definitely one of

26:11

those projects where it's resume-driven

26:14

development. I wanted to just

26:16

learn how it worked, and I came up

26:18

with an excuse. And so far, that's working

26:20

great. I'm not a PhD, I'm not

26:22

a math whiz, but I'm an ops kid

26:24

that likes to play with software.

26:26

And so far, that's good enough. You

26:28

said that the model you selected

26:30

was better with programming responses specifically.

26:33

Was that a cedar pants vibe

26:35

kind of thing? Like, I don't

26:37

really like the answer I got

26:39

from this other model, but this

26:41

one, yeah, Antropics really doing it right.

26:43

Yeah, I measured the Antropic Claude Sonnet

26:45

versus AWS's Titan model. I think

26:48

that's the name of their model,

26:50

their newest sort of general AI

26:52

text. processing model. And I asked

26:54

it specific SRE type question, software

26:57

engineer questions, and the AWS model

26:59

said, you know, you should probably talk

27:01

to a software engineer. And I'm like,

27:03

no, I know I can talk to

27:05

a software engineer. I'm talking to you.

27:08

Give me your best answer, particularly about

27:10

questions of like, how do these

27:12

AWS services work, which I feel like AWS's

27:14

AI should probably be trained on that a

27:16

little more. They should know how the AWS

27:18

cloud works. I'm just saying. So yeah,

27:20

I just tested a couple of different

27:23

software as different programming questions, sort

27:25

of like an interview. It's kind

27:27

of like I'm interviewing them for

27:30

a job, which is a really

27:32

apt analogy here. I noticed in

27:34

one of your posts that there is some

27:36

model tuning you could do to get the

27:38

sort of answer that you're looking for. Like

27:40

you wanted it to give you kind of

27:42

an engineering friendly answer with, with not too

27:45

many hallucinations, but also not too restricted. Because

27:47

if I remember right, the way you wrote

27:49

the post, if you can tune it in

27:51

such a way that you can hardly get

27:53

anything useful out of it, but if you

27:55

let it go crazy, you'll get a lot

27:57

of bogus data. How do you do that tuning?

27:59

It's so fascinating, it's so little like

28:02

programming and so much like talking to

28:04

maybe like a junior style engineer that's

28:06

not confident in themselves, because if you

28:08

are opinionated enough, it will agree with

28:11

you, no matter what. And I am

28:13

a confident person, and I've had this

28:15

trouble with junior engineers before, where I

28:18

say something so confidently and so wrong,

28:20

and they'll agree with me, because like,

28:22

you're so confident, you must know what

28:24

you're talking about, and that's not true.

28:27

I just come across that way. you

28:29

are able to set a couple of

28:31

parameters for most of these models and

28:33

the parameters sometimes differ but the big

28:36

ones that people should know about are

28:38

temperature and top P and temperature is

28:40

from zero to one and it's the

28:42

amount of creativity sort of freewheeling that

28:45

you permit the model to do and

28:47

you can sort of turn the creativity

28:49

all the way up to one. Oh

28:51

and it will just make stuff up.

28:54

Which like we've all met. engineers that

28:56

do that? Maybe me too. And... What

28:58

qualified make stuff up? I mean, it's

29:01

not going to be purely a random

29:03

answer. It's still an LLM. It's still

29:05

following some kind of context or, you

29:07

know, language chain. And to give you

29:10

words that in theory should be plausible,

29:12

it's so it's not just making things

29:14

up, right? I think it's the amount

29:16

of reward that the model is given

29:19

for agreeing with you. and for telling

29:21

you positive answers. So does the moon

29:23

go around the sun? And if the

29:25

temperatures wonder, oh, it'll say, of course

29:28

it does, and it'll explain how Galileo

29:30

proved that the sun goes, the moon

29:32

goes around the sun. And like, that's

29:34

not true, but the model's reward for

29:37

saying yes is high. So it'll do

29:39

it. It'll be rewarded for just lying

29:41

to your face. So what we've done

29:44

for this, this is supposed to be

29:46

a model that doesn't lie. It grounds

29:48

its information based on what's real and

29:50

not just what I want to hear,

29:53

which is... very much preferable in an

29:55

engineering context, is turn the temperature way,

29:57

way down. I'm currently at point one.

29:59

I could probably go smaller. I think

30:02

I can go hundreds of places and

30:04

not just tens. But the problem is

30:06

that when you get really low, it

30:08

stops being able to kind of make

30:11

this sort of inference style reasoning, where.

30:13

If it knows that a marble falls

30:15

to the earth because of gravity and

30:17

its temperature is zero, and you say,

30:20

does a basketball fall to the earth

30:22

because of gravity? And it'll say, no,

30:24

I don't have information to back that

30:26

up. I can't make a deduction or

30:29

an inference. I know marbles do, but

30:31

I can't infer that anything else is

30:33

also subject to gravity. Right. Exactly. Which

30:36

is. unreasonably grounded in reality. And so

30:38

really you want it to be able

30:40

to make some references. If you can

30:42

write a loop in Python, you can

30:45

probably write a loop in Bash, and

30:47

I'll go find out how. So you

30:49

want the temperature to be a little

30:51

bit high, a little bit up. Again,

30:54

I started with like point three, and

30:56

I'm trying to get it to point

30:58

one. It still makes stuff up sometimes.

31:00

A, I just do at this point

31:03

of the state of the state of

31:05

the art. But you can also set

31:07

the top P, which is the number

31:09

of tokens it'll consider. for the next

31:12

choice. So like how randomly it chooses

31:14

the next token. So if Top P

31:16

is like 25 words, it's considering 25

31:19

tokens using its temperature algorithm. And I

31:21

hope this is all accurate. If there's

31:23

AI folks out there that got it

31:25

wrong, I'm doing my best. But it's

31:28

been working so far. So yeah, that's

31:30

where we're at. So

31:32

we talked about a little bit about

31:35

guardrails, but I'd like to dig into

31:37

that a little bit more. Again, guardrails

31:39

are essentially like controls on what the

31:41

model will respond to based on prompts.

31:44

And can you talk about, you know,

31:46

what kind of guardrails are available and

31:48

what you were interested in? Yeah, absolutely.

31:50

So those exist. I imagine in most

31:53

of these hyperscaler platforms, but specifically, Azure

31:55

and AWS have the concept of guard

31:57

rails. or model blocking, I think it

32:00

might be called in Azure, where you

32:02

can give it specific things that it

32:04

shouldn't talk about. Categories like profanity, like

32:06

if anyone curses at the AI, you

32:09

can block it on the input or

32:11

block it on the output. You don't

32:13

want the model cursing at people or

32:15

nudity or violence, like don't explain how

32:18

to make C4, please. Like maybe you've

32:20

been trained on that data. Please don't

32:22

explain that in the context of my

32:24

business app. Something that our legal team

32:27

in particular asked me to do was

32:29

make sure that it won't give financial

32:31

advice. Because it sort of seems like

32:34

it's speaking for the company, right? If

32:36

you have any kind of AI, like

32:38

this, that was that famous story in

32:40

Canada where there was a car dealership

32:43

that had an AI. And it was

32:45

a car dealership that had an AI

32:47

channel of support and it promised it

32:49

would give them a car for $10.

32:52

And they were sued and I can't

32:54

remember how that worked out. should I

32:56

buy your stock? Is it going to

32:58

go up next week? It might actually

33:01

have information that is accurate on that

33:03

question and it's also highly illegal for

33:05

it to give that information to anyone.

33:08

So we cannot do that. So something

33:10

cool that you can do on the

33:12

AWS guardrail side is you can give

33:14

it example questions. This isn't a category.

33:17

Financial advice is not a category like

33:19

profanity and nudity and violence, but you

33:21

can get examples of questions that it

33:23

should not answer. And responses that it

33:26

should give instead. So we wrote a

33:28

couple of questions like that for financial

33:30

advice for stock investment for the future

33:32

of the company in terms of growth

33:35

or sales and said that I'm sorry

33:37

I'm not authorized to speak on behalf

33:39

of this company. So just sort of

33:42

catch all responses that say like I'm

33:44

not going to actually give you this

33:46

answer. And it's interesting because it's it's

33:48

not trained into the model. It's not

33:51

part of the model. It's. a guardrail

33:53

that just processes every in and out

33:55

using AI. Like it's using generative AI

33:57

as a totally. separate process as a

34:00

layer to measure your question in and

34:02

your response out to see whether they

34:04

fit your parameters of what you permit.

34:06

So my assumption around guardrails is it

34:09

was sort of like you know when I

34:11

get web filtering services from a security company

34:13

I can check all the boxes no hate

34:16

speech no gambling no whatever and I don't

34:18

have to go out and find all of

34:20

the URLs associated with that they're doing it

34:22

for me. I assumed it was the same

34:25

with guardrails is that the case or it

34:27

sounds like I can also program in very

34:29

specific rules. Yeah, that's exactly true is

34:31

what you said right at the end. It

34:34

both does these categories that you

34:36

don't have to train it on all the

34:38

words that qualify as profanity. You can check

34:40

the box and set it. I think it's

34:43

like low medium high. It would be an

34:45

interesting day to program that in. I would

34:47

do it. I think it'd be fun. And

34:49

you can also give it these sort of

34:52

AI generative questions and answers

34:54

that it should be providing. So

34:56

it's sort of going beyond just

34:58

what it supports to block traffic.

35:00

So it sort of works like

35:02

a WAF in the sense that

35:04

it's finding specific things, but it's also

35:07

finding similar things that qualify.

35:09

So if it's detecting that it

35:11

seems like profanity, it will be

35:13

blocked by the profanity filter, which

35:15

is pretty cool. It occasionally is

35:18

a little overzealous. some of our finance

35:20

team wants to talk about like, how do

35:22

I find a credit card number that I

35:24

can use to check out, you know, in

35:26

our demo environment? And then it's saying,

35:28

I'm not going to give you

35:30

a credit card number, obviously. And

35:33

so we've had some edge cases

35:35

where we have to kind of

35:37

tweak it for, you know, the

35:39

bizarre things that developers have to

35:41

do to make apps actually work.

35:43

Well, you need to make exceptions

35:45

for Australians in the case of

35:47

profanity because what most of us

35:49

would consider profanity is just everyday

35:51

speech for the average Australian. Absolutely.

35:53

Maybe there's an Australian mode. I

35:55

haven't seen it yet, but AWS,

35:57

please build that. Context, conversation, context.

36:00

That's really important for that

36:02

human-like experience when chatting with

36:04

the bots. So how do we get

36:06

context? Well, that's an interesting problem

36:08

here, because you're building a conversation, which

36:11

is a series of person A, or

36:13

a user, speaking to person B, or

36:15

the system. And you can have lots

36:17

of conversation turns, but that's what you're

36:19

providing to bedrock and saying, you know,

36:22

here's the whole conversation that's previously

36:24

happened. Please generate a response

36:26

using this context. And at

36:28

first I built this to just

36:30

read everything in a direct message

36:33

thread like all of the conversation

36:35

you've had, which can be hundreds

36:37

of turns on all sorts of

36:39

topics and. The AI went crazy because it

36:42

got really confused. First of all,

36:44

there's just too much context for

36:46

its process in a reasonable amount of

36:48

time. But also if you're asking about

36:50

topic A and then topic B and

36:52

then topic C, those are kind of

36:55

related, but passing all that information

36:57

at once to someone, you would

36:59

confuse any human with so much

37:01

context immediately and it confused the

37:03

AI right away. So I decided

37:05

to kind of found a conversation.

37:08

context, in the same way

37:10

that sort of slack does

37:12

natively, which is called threads.

37:15

Threads are sort of these

37:17

child objects in direct message.

37:19

And so it's not like

37:22

a parent level message message message.

37:24

It's a child a child a

37:26

child beneath a message. And so

37:29

I just read the entire context

37:31

of the thread. And we also

37:33

look up all of the user

37:35

information. So find your real name

37:37

Drew Conroy Murray and your pronouns

37:40

if you set them in slack.

37:42

So it can speak more naturally.

37:44

It was using they then for

37:47

everyone which was bizarre. And It's

37:49

able to, because it's reading that whole

37:51

thread and passing it forward to

37:53

bedrock, it's able to understand who's

37:55

speaking. So if Ethan and Drew

37:57

are arguing about something, it's able.

37:59

to understand who has opinions about

38:01

what. And it can kind of

38:03

help settle arguments or summarize. the positions

38:06

of the different people on the thread

38:08

and who agrees with who and who

38:10

thinks blah blah blah. But that's something

38:13

I didn't expect people to do. They

38:15

immediately started using it to summarize these

38:17

really long slack threads of these

38:19

two experts arguing for 50 conversations

38:21

and then you come in at the

38:24

bottom and you're like, you know, it's

38:26

that meme where you walk into the

38:28

room with pizza and everything's on

38:30

fire and you're like, what happened

38:32

here? And so you can ask the

38:35

AI like, please read this whole thread

38:37

in 50 words or less. what they're

38:39

talking about. And it can do

38:41

that because it's reading thread and

38:43

getting all the context of who's speaking

38:46

and what they've said, including any documents

38:48

that are attached, documents as a whole

38:51

other challenging ball of wax, but

38:53

primarily that's how context is working.

38:55

That's all something that I just kind

38:57

of made up, that it makes sense

38:59

to me in Slack threads being a

39:02

conversation boundary. So let's just use

39:04

that as a conversation for bedrock.

39:06

And so but that's me using the

39:08

slack bot you know I'm interfacing with

39:10

slack I need to know each conversation

39:13

I have with this slack bot

39:15

needs to be threaded in order

39:17

to have context is that true? Yeah

39:19

so you'll either tag it in a

39:21

thread or in a parent message and

39:24

it will respond in a thread

39:26

so it sort of guides you

39:28

towards using this model you don't have

39:30

to memorize that and that's the sort

39:32

of context that's passed in in

39:34

real time and it's all built

39:36

so it doesn't. keep it. But we

39:39

also have training data, the knowledge base

39:41

that it can look up. And so

39:43

we use this first phase of

39:45

the conversation where you read the

39:47

entire thread, which can be really long,

39:50

right? People get verbose in slack, or

39:52

at least I'm very chatty in slack.

39:54

And it'll look at our knowledge

39:56

base, which is all the data

39:58

we have trained it on. It's a

40:01

vector database called Open Search in AWS.

40:03

is like a vector database platform. And

40:05

that's where all of our knowledge

40:07

is stored. And it sort of

40:09

finds related conversational vectors. So like related

40:12

to the topics you're talking on. the

40:14

information you've trained it on. And we

40:16

pass that information as additional context.

40:18

And I'm just doing additional conversation

40:20

turns that say like, hey, this is

40:23

a knowledge-based entry, please use this. And

40:25

then phase two is you actually talk

40:27

to the model with that assembled

40:29

thread of the user's request, the

40:31

thread that it's in, the conversation knowledge-based

40:34

information that we've retrieved, and that whole

40:36

package is given to the AI to

40:38

say like, hey, please make sense

40:40

of this and give us a

40:42

response. A quick observation

40:45

here in your notes you mentioned

40:47

you were running bedrock in U.S.

40:49

East 1, A.W.S. U.S. East 1,

40:51

but it was kind of broke

40:53

and you ended up using U.S.

40:55

West 2 and it's been working

40:57

great ever since. What was your

40:59

experience with it being broke? Were

41:01

you just getting errors or strange

41:03

responses? It was just giving me

41:05

errors that my things were malformed.

41:07

My API requests were malformed despite

41:09

them exactly matching the doc. You

41:11

try to troubleshoot your code, like

41:13

maybe I've done it wrong. I've

41:15

done it wrong many times before,

41:17

but this exactly matches the document

41:19

example. And I talked to our

41:22

Tam and some friends, and they

41:24

said, well, you know, East One

41:26

breaks sometimes with bedrock. East One

41:28

is an overloaded region. There's a

41:30

lot going on there. But West

41:32

Two is, it gets the new

41:34

stuff first. because I don't know

41:36

why. So try that. And I

41:38

flipped over to that region and

41:40

it worked right away with no

41:42

code changes. I just pointed out

41:44

a new place. So since then

41:46

I've left it. So my lambda

41:48

runs in East One and it

41:50

uses service in West Two. That's

41:52

called cross region inference, but it

41:54

works fine and it's free. So

41:56

I just kind of left it

41:58

there. It's a little awkward jumping

42:01

around the console regions to read

42:03

the logs for different services, but

42:05

it's not annoying enough for me

42:07

to fix it. word stays. Speaking

42:09

of free, you mentioned earlier that

42:11

overall this project's been pretty inexpensive,

42:13

but I am scared the death

42:15

of running up my AWS cost.

42:17

Can I, is there a way

42:19

I can guard against costs getting

42:21

out of control if I'm using

42:23

bedrock or lambda? Yeah, absolutely. You

42:25

can write warnings in your cost

42:27

explorer that trigger and will email

42:29

you if you're beyond like $10

42:31

a month or $20 a month,

42:33

or if your projection is higher

42:35

than that. But generally in Bedrock,

42:37

it's so inexpensive, I would recommend

42:40

you could try it. And Lambda

42:42

similarly costs almost nothing. I think

42:44

Lambda costs like a dollar a

42:46

month for 400 requests. The real

42:48

cost is the knowledge bases. You

42:50

have to be very careful with

42:52

that. I trained it on about

42:54

40 gigabytes of confluence data. And

42:56

you would expect storing 40 gigabytes

42:58

in a database would cost, you

43:00

know, maybe $100 a month. I

43:02

don't know. That's a napkin math.

43:04

It's around $1,200 a month. So

43:06

it's... it's like $25,000 a year

43:08

or something like that is what

43:10

it initially cost. I've been fiddling

43:12

with it to get the math

43:14

down and we're still around like

43:16

14 grand a year to store

43:18

40 gigabytes of data in a

43:21

database. So it's that is significant

43:23

for especially for an internal tool

43:25

that's not generating revenue. I'm just

43:27

in cost center building stuff. And

43:29

that's so be very wary of

43:31

knowledge bases because they're very expensive,

43:33

but the lambda and the bedrock

43:35

so far have cost. almost nothing

43:37

a couple of Starbucks a month

43:39

and you're good to go for

43:41

an A. I bought in your

43:43

series as you've been writing about

43:45

this you made a big deal

43:47

about Lambda being all about you

43:49

don't want to have to think

43:51

about infrastructure ever and so and

43:53

so Lambda but let's say I'm

43:55

okay with with managing some infrastructure

43:57

I've got a server lying around

44:00

of your architecture, what you've designed here, what

44:02

processes would be running on the on

44:04

a server, and would running on a server

44:06

simplify this thing, or am I

44:08

just kind of moving complexity around?

44:10

I think you're moving complexity around.

44:12

I think you're moving complexity

44:14

around a little bit for a couple of

44:16

reasons. So first of all, it has to

44:18

be exposed to the internet somehow because the

44:20

slack servers are on the internet. So either

44:22

you need something like an in-grock that's doing

44:25

this sort of piping of public to private

44:27

to get to your server, or you need an

44:29

ALB. to receive the traffic, which is

44:31

going to cost you more than bedrock

44:33

is costing per month, even with no

44:35

use. I think it's $16 a month,

44:37

even if you have no services at

44:39

all. And you also need to

44:41

handle the authentication, because you're using

44:43

IM authentication. I was going to

44:46

say it expired after a few months,

44:48

but with an implicit IM role, I

44:50

think that would actually be solved. So

44:52

you would just have to handle ingress,

44:54

and it would work just fine. I think

44:56

that's all. Yeah. There's a main function in the code.

44:59

I think in the show notes we'll link you to

45:01

the get up if you want to check it

45:03

out. There's a lambda handler and there's a main

45:05

function handler and they're written in such a

45:07

way you can just run the code and

45:10

it'll detect your sort of context. And if

45:12

you're just running on your computer and you

45:14

have all the things installed, it'll work

45:16

fine. As opposed to the web hook

45:18

reaching out to that lambda URL and

45:20

firing up the it's just gonna the

45:22

web hook would instead hit my server

45:25

Instance and and it would run from

45:27

there were to just be sitting there

45:29

live waiting Yep, absolutely. So it's it's

45:31

an either or You've built something that's

45:34

standalone no infrastructure required don't have to

45:36

upgrade don't have to maintain and it's

45:38

pretty cheap I looked at it not being

45:40

someone who spends much time in cloud or

45:42

writing terra form or I've never written a

45:45

lambda function of my life going, ah, this

45:47

all seems a little intimidating, but like a

45:49

service running on a server, that I know,

45:51

that I'm really comfortable with. But as you

45:53

say, it is just moving things around. Now

45:55

I've got, now I've got a process living on

45:57

a server and now I've got to care careful.

46:00

and feed it. Yeah, absolutely. It becomes

46:02

a pet. This is the Lambda version is

46:04

a cattle. If it goes crazy and throws

46:06

an error, we kill it and we get

46:08

a new one and that's an unfortunate metaphor.

46:10

But if it's on a server, it

46:12

has to be your pet. You're monitoring

46:14

the CPU. You make sure the

46:16

disk doesn't fill up. Have you

46:18

patched it recently? You probably should.

46:20

Do we have any kind of

46:22

SRE infrastructure in place to monitor? It's

46:24

a CPU getting high and it's slowing down

46:27

because it's handling too much? Do we have

46:29

anavirus on it? You just sort of have

46:31

to handle all of that stuff in this

46:33

sort of pet world where you have your server

46:35

and you have to care and feed it. And

46:37

then the final follow-up question for that

46:39

you've talked about Engrock. And it sounds

46:42

like it's a gateway for to go

46:44

between a public network and a private

46:46

network, kind of. If we haven't heard

46:48

of Enrock before, what is this thing?

46:50

What does it do? Totally. I hadn't

46:53

before I built this project, but I

46:55

had this public web hook coming from

46:57

Slack because you're messaging in and that's

46:59

what happens when you tag your bot.

47:01

It generates a web hook and it

47:04

sends it somewhere. And I had to

47:06

get it to my computer, which is,

47:08

you know, inside my internal network. I

47:10

didn't want to give myself a public

47:12

IP or anything. And I just have

47:14

a silly router, not a Cisco ASA

47:17

or something to do like a static

47:19

net. shield from the internet. So I

47:21

needed to get that to my computer.

47:23

And this Engrock service, it's an

47:25

open source tool and platform that

47:27

lets you do one URL forwarding

47:30

concurrently to your private computer and

47:32

it sort of builds a tunnel

47:34

from the Engrock service to

47:36

your computer from public to

47:38

private. And it gives you some

47:40

inside into each HDTP connection of like

47:43

the code that you're receiving and returning.

47:45

So like a 200 is a happy

47:47

little HDTP packet. And you can see the

47:49

latency and how much traffic you're getting

47:51

and stuff like that. And it's

47:53

just this cool little open source

47:55

tool and platform that may

47:57

developing locally from like public.

48:00

generated web hook. Super easy.

48:02

I doubt it's as secure enough for

48:04

an enterprise implementation. I

48:06

wouldn't build your whole bot like that.

48:08

But you probably could. Maybe you

48:11

could run ingrock to get your

48:13

public access to your local development

48:15

environment. Try it out. the key

48:17

piece of it, it sounds like you said there's

48:20

an Engrock service. So there's some, some Engrock service

48:22

living out in the cloud that's going to be

48:24

basically a proxy. I'm going to send my web

48:26

hook, it's going to land on the Engrock service,

48:28

which is going to go, oh, I know where

48:30

this goes. It's on the other side of this,

48:32

Donald, to Kyler's, sitting inside of her enterprise, and

48:35

sends it, it's going to go, oh, I know

48:37

where this goes, it's, it's on, it's on, it's on,

48:39

it's on, it's, it's on, it's, it's, it's on, it's,

48:41

it's, it's on, it's, it's, it's, it's, it's, it's on,

48:43

it's, it's, it's, it's on, it's, it's, it's, it's on,

48:45

it's on, it's, it's, it's, it's, it's listening

48:47

on local hosts, port 3,000 or something

48:49

like that. And you just tell

48:51

the ingrock, like, accept traffic on

48:54

443, securely, and then tunnel it

48:56

securely to me, and drop it on

48:58

port 3,000 local hosts. And so your

49:00

Python script receives the traffic. It's wild,

49:02

because it's so complex. But I probably

49:05

spent 15 minutes googling it, and then

49:07

I turned it on, and it worked

49:09

right away, and I've never had an

49:11

issue with it. And I've never had

49:14

an issue with it. incredible for really

49:16

rapid development. So we talked earlier about,

49:18

you know, the model you chose and

49:20

it was from Antropic, a cloud version.

49:23

But you also, so that's, that cloud

49:25

version is trained on some essentially public

49:27

data set, but you wanted to augment

49:29

it with internal data, which you

49:31

said you're coming from confluence. Is

49:33

that what rag means? Retrival augmented

49:35

generation? Is that what this was

49:37

or something else? That's exactly what

49:39

it is. So for folks that,

49:42

you know, aren't AI engineers,

49:44

rag is retrieve and generate

49:46

or retrieval augmented generation, which

49:48

means using AI to construct

49:50

vectors. So you take this unstructured

49:52

data, which that's the the slur

49:55

that AI engineers use for stuff

49:57

that's written for humans, like you

49:59

have. a document, you have a

50:01

chart, you have a, you know, your

50:03

Excel spreadsheet that's written for you to

50:05

understand it. It's not written for an

50:08

AI model to understand it and sort

50:10

of convert that data to a format

50:12

that's understandable by a, you know, a

50:14

vector database by a model. But there

50:17

are models that are specifically an embedding

50:19

model. That's what it's called. They're specifically

50:21

built to take unstructured data and

50:24

store it in vector databases in

50:26

a format that's compatible with models. And

50:28

so it read all of our confluence.

50:31

It also supports S3 on the AWS

50:33

side and you can upload whatever. Cool

50:35

side benefit of that is when you

50:37

upload stuff, it triggers the knowledge base

50:39

to read it right away versus the

50:42

confluence side. You have to go click

50:44

the button that says read confluence again

50:46

today, which is, I hope they solve

50:48

scheduling in the future, but right now

50:51

you have to click a button to

50:53

say read it again. And there's

50:55

also others supported like their share

50:57

points. So I'm. just going to

50:59

keep scaling this. The way that we're

51:01

starting to frame this project internally is

51:03

all of this data is already accessible

51:06

to all of our users. You

51:08

can go to confluence yourself and

51:10

read the website or SharePoint or

51:12

Slack or our PDFs for our

51:14

customer service agents or our page

51:16

of duty resolutions for our SREs.

51:18

And all of those services individually

51:20

have AI models you can pay for. A lot

51:22

of companies have risen to this, but

51:24

they're all seated licenses and they're all

51:26

separate. And so if you're paying for

51:29

like $10 a month per user per

51:31

platform, that's like, I don't even know

51:33

if the math just goes crazy. So

51:35

even if we're paying $14,000 a year

51:37

to have this knowledge base exist, if

51:39

I can put data from all of

51:41

these disparate services in one place, then

51:44

this model can make some pretty

51:46

informed decisions if it can read

51:48

your pager duty and your share

51:50

point and your confluence and your

51:52

slack and maybe all your PDF

51:54

of like how to resolve stuff. So.

51:57

That's I think the pitch

51:59

is. What could an AI model do

52:01

that's very accessible for your users and

52:04

private and has read all of your

52:06

internal infrastructure documents? And maybe your configs

52:08

too. I don't know what we can

52:11

train it on, but we're going to

52:13

put a lot in there and see

52:15

what happens. And just a side note,

52:18

we've been talking about vector database. I

52:20

didn't know what that was. I had

52:22

to look it up before the show.

52:24

So my understanding is that the thing

52:27

that's cool about a vector database is

52:29

that it can sort of. If I

52:31

put in a query about smartphones, it

52:34

will return information that it found related

52:36

to also mobile devices and cell phones,

52:38

as opposed to just keying off the

52:41

specific word smartphone. That's the benefit of

52:43

a vector database. Is that correct? Yeah,

52:45

that's my understanding too. Keywords work the

52:47

best. It still shows pretty solid deference

52:50

for like an exact match of a

52:52

keyword. But yeah, it's finding related topics

52:54

and in much the same way your

52:57

brain would when we say phone, you

52:59

think like Android, iOS, blah, blah. Yeah,

53:01

long lost Blackberry. Yeah, so. So getting

53:04

your confluence database into the model that

53:06

you have to do anything special to

53:08

prepare the data or was just like,

53:11

here you go and you take care

53:13

of it? No, it worked really well.

53:15

This is a beta data source that

53:17

is supported by AWS Bedrock's team. So

53:20

you create a knowledge base and then

53:22

you add data sources to it. The

53:24

knowledge base basically is one-to-one with an

53:27

open search database that's just running in

53:29

the background charging you money. And the

53:31

data sources are these sort of scheduled

53:34

automated processes to go scrape something and

53:36

put the data, shove the data into

53:38

your open search. database into your knowledge

53:40

base. And so you can add lots

53:43

of data sources into one knowledge base.

53:45

So confluence is supported. It's still in

53:47

beta. It still doesn't read a lot

53:50

of your specially structured data. So bedrock

53:52

itself supports PDFs and documents and Excel

53:54

and etc. But this specific. ingestion mechanism

53:57

for confluence doesn't. So each time I

53:59

run it, it shows 80% failure. It's

54:01

like 200,000 failures of 250,000 documents, like

54:03

70 or 80%. And it still works.

54:06

It's ingested all the text, but it

54:08

doesn't ingest the binary files, the structured

54:10

data files. So hopefully that's coming. I'm

54:13

sure that'll be solved by them in

54:15

the future. But what we're gonna do

54:17

for other data, like... PDFs that we

54:20

upload to S3 or data we're scraping

54:22

from internal wiki's and putting in there

54:24

is this structured data file type where

54:27

you say the source is this URL

54:29

even if it's not accessible to the

54:31

model. You're just storing it in S3.

54:33

And so when your model references that

54:36

data. You say read this thing and

54:38

it says, oh, you know, it comes

54:40

from this this PDF. It doesn't say,

54:43

go read the PDF in the S3

54:45

human person. It says, go to this

54:47

URL that you have access to, this

54:50

internal wiki URL that isn't read, because

54:52

you can access it there as a

54:54

human, but our model is not able

54:56

to support that yet. So that's probably

54:59

how we're going to support a lot

55:01

of internal data that's. private. That means

55:03

duplicating data, which isn't great, but being

55:06

able to put it in this model

55:08

is pretty cool and powerful. So I

55:10

think we're going to explore that. With

55:13

that data ingestion that you were just

55:15

describing, rag and augmenting, the core model

55:17

with all of this specific data that's

55:19

specific to your organization. I'm a network

55:22

engineer and I want to be able

55:24

to ask the model about the state

55:26

of the network in real time. Can

55:29

you imagine a scenario where this, I

55:31

don't know, some kind of a telemetry

55:33

feed or something where the model can

55:36

kind of keep up with the state

55:38

of the network? And so then I

55:40

can ask questions and it'll tell me

55:43

what's going on in New York, you

55:45

know, these kind of things. Yeah, absolutely.

55:47

There's so much to this question. And

55:49

there's so much I don't know yet

55:52

that I want to test out. But

55:54

generally, these types of models are trained

55:56

on general data. They're not trained on

55:59

the state of the network. They're trained

56:01

on how does terraform work, what are

56:03

AWS service names, etc. And you can

56:06

pass it information in real time. Like,

56:08

hey, we got an alert. in our

56:10

slack that a VPN went down. Can

56:12

you tell us what to do? But

56:15

it's not reading your config, like it's

56:17

not S.S.H. into your firewall and looking

56:19

at state data. It's just reading the

56:22

log in slack and sort of giving

56:24

you basic information about it, which is

56:26

not great and not what we want.

56:29

So there is this concept. You'll see

56:31

it if you start reading about AI

56:33

stuff called agentec. A agent. And it's

56:36

a very fancy word that just means

56:38

you give the model the ability to

56:40

do stuff. And so I can absolutely

56:42

see a use case for saying like

56:45

asking the model, is the VPN to

56:47

Chicago up on this firewall? And giving

56:49

it the ability to in the background,

56:52

SSH to your device and read your

56:54

list of firewalls and look for one

56:56

of the tunnels that has a description

56:59

of Chicago and see its state. And

57:01

that will all take some sort of

57:03

custom building stuff. I haven't gotten to

57:05

building agentic stuff yet, but that's supported

57:08

by bedrock. That's supported by Azure AI.

57:10

I imagine supported by GCP. And that's

57:12

going to be the next generation of

57:15

AI stuff. It's still very, very new.

57:17

It will develop significantly in the next

57:19

year or two and hopefully have some

57:22

sort of pre-built puzzle pieces for us

57:24

to SAGE to something from an internal

57:26

impoint and read the data and sort

57:28

of... added as a conversation turn so

57:31

that AI can do it, can understand

57:33

it. But that's for sure coming. One

57:35

of the internal projects that I'm going

57:38

to be building in a hackday pretty

57:40

soon is to give an AI model

57:42

internal network architecture diagram. So like a

57:45

PDF that shows all the system names

57:47

and hosts subnets and stuff like that.

57:49

And then hopefully a user will be

57:52

able to talk to the bot and

57:54

say, my IP is this. I'm going

57:56

to this destination IP on this port

57:58

number. Is it accessible? And what I'm

58:01

hoping the AI will be able to

58:03

do is understand. where the user is

58:05

in the network diagram and where the

58:08

destination is, and then look at all

58:10

the interstitial nodes in the middle and

58:12

read their configuration. Because if we have

58:15

put the configuration into like an S3

58:17

bucket, this isn't real time, by the

58:19

way, this is like maybe having your

58:21

rancid open source config backup tool, dump

58:24

it into S3, and then read the

58:26

S3, is it currently permitted? Because how

58:28

much time as a network engineer do

58:31

you spend with people saying, you know,

58:33

I, my host can't get to the

58:35

thing? Should I be able to get

58:38

to the network that's doing this? So

58:40

if you can have a bot that

58:42

answers all those questions, imagine how much

58:44

time you can get back. So I

58:47

have no idea if that will work.

58:49

I hope that it will work. Ask

58:51

me in a couple weeks after we

58:54

have this hack day and we'll let

58:56

you know whether we've succeeded. I suspect

58:58

it's a challenging problem in that. We've

59:01

got, there's companies out there that have

59:03

products that do this that have taken

59:05

them years to develop so that it's

59:08

robust enough for an enterprise use case.

59:10

Forward Networks comes to mind, as folks

59:12

in this space doing this kind of

59:14

stuff. But I'm intrigued. I really want

59:17

to know where this goes. Another question

59:19

is related to cost. We were talking

59:21

about cost before and you said, hey,

59:24

the big cost comes in when you

59:26

deal with that knowledge base. Is that

59:28

what we were just talking about with

59:31

what you do with confluence or anytime

59:33

you're ingesting rag-style data? Is that where

59:35

that cost is going to come in?

59:37

Yeah. It's funny because even if you're

59:40

just reading like a PDF. It still

59:42

has to spin up the whole infrastructure

59:44

of a knowledge base that's actually an

59:47

open search database in the background. And

59:49

its minimum cost is that like around

59:51

$17,000 a year. And that's huge, even

59:54

if you're reading one PDF and training

59:56

it on. on one PDF. So that's

59:58

just not great. I'm hoping that as

1:00:00

these technologies mature, we'll get to points

1:00:03

where we can have it be much

1:00:05

cheaper and be charged based on like

1:00:07

the number of tokens ingested or something

1:00:10

that's more. corollary to the amount of

1:00:12

data. Because one PDF training shouldn't cost

1:00:14

$18 a year. Like that's just unreasonable.

1:00:17

And I think our AWS team is

1:00:19

understanding of that. We've been able to

1:00:21

make one change to bring the cost

1:00:24

down about 35% which is helpful, but

1:00:26

I wanted to be a couple hundred

1:00:28

dollars a year or something more correspondent

1:00:30

to the value that it's generating for

1:00:33

the business. And we're just not there

1:00:35

yet. We have to really commit as

1:00:37

an enterprise. And if you're like a

1:00:40

mom and pop shop, You can't spend

1:00:42

18 grand a year on this widget.

1:00:44

Like, it's just not reasonable. So hopefully

1:00:47

we'll get there soon. Well, Carla Middleton,

1:00:49

thank you for sharing all of your

1:00:51

experience and knowledge of those. This was

1:00:53

absolutely fantastic. And if you're listening and

1:00:56

you want to get into the details,

1:00:58

you want to see everything, all the

1:01:00

code, all the terra form, etc, that

1:01:03

Carla's been working with. That's all at.

1:01:05

Let's do Devops.com, which is which is

1:01:07

her sub stack. I'm very active on

1:01:10

LinkedIn. I host that day two Devops

1:01:12

with Ned Bellavans on the same pack

1:01:14

of pushers network that you're on now,

1:01:16

so please come check us out. And

1:01:19

I just get around to as many

1:01:21

conferences as will have me. I'm hopefully

1:01:23

going to be in Philadelphia later this

1:01:26

year at Reinforce. So look for me

1:01:28

there. If you do want to read

1:01:30

more about this AI stuff on let's

1:01:33

do Devops.com, I have a coupon code

1:01:35

to read the stuff that's still behind

1:01:37

the payroll. It's all becoming free, but

1:01:40

let's do Devops.com/heavy networking with no space

1:01:42

or dash or anything. We'll get you

1:01:44

a free month trial to go read

1:01:46

it all, copy it to your desktop,

1:01:49

get that stuff down, and all the

1:01:51

code is free on get up. It's

1:01:53

all linked from the sub stack. So

1:01:56

you can do this in your own

1:01:58

enterprise. to devops.com/heavy networking. I didn't know

1:02:00

you were going to do that until

1:02:03

just this second. That's awesome. Kylie, thank

1:02:05

you for that. Seriously. Again, Kylie gave

1:02:07

me a couple of freebies so I

1:02:09

could read and research for this podcast

1:02:12

without having to sub to the, or

1:02:14

sub stack, but I will tell you.

1:02:16

Subscribe to the substack, it's that good.

1:02:19

It's really, really valuable information if you're

1:02:21

at all interested in this stuff. Anyway,

1:02:23

thank you for listening to Heavy Networking

1:02:26

Today from the Packet Pushes podcast network.

1:02:28

It's all content for your professional career

1:02:30

development and just some quick housekeeping items

1:02:33

as we close, merch, go to store.

1:02:35

Packet Pushes.net, and don't overlook the collections

1:02:37

link in the header of store. Packet

1:02:39

Pushes. We got stuff for every show

1:02:42

that's family. podcast palette. We have a

1:02:44

newsletter. We have multiple newsletters, but I'm

1:02:46

going to focus on the human infrastructure

1:02:49

newsletter today. Drew and I publish that

1:02:51

every week. We share the best blogs,

1:02:53

news, vendor announcements, resource, and of course

1:02:56

memes that we have found. Everything you

1:02:58

need to know from the world of

1:03:00

networking and tech sent to your inbox

1:03:02

with love. AutoCon 3 is our last

1:03:05

housekeeping note today. AutoCon is the industry's

1:03:07

only conference devoted to network automation and

1:03:09

it is coming to Prague in late

1:03:12

May 2025. The Packard Pushes team is

1:03:14

going to be there and we would

1:03:16

love to see you. Visit network automation

1:03:19

dot forum and get a ticket for

1:03:21

AutoCon 3 while you still can. This

1:03:23

event does have an attendance cap and

1:03:25

it will sell out. It's gonna. It's

1:03:28

just a matter of time before it's

1:03:30

settled out. So if you're interested in

1:03:32

going abroad for AutoCon 3. Go buy

1:03:35

your ticket, network automation.forum. And if you

1:03:37

enjoyed our conversation with Kyler, you again

1:03:39

should subscribe to her podcast, a two

1:03:42

Devops, along with Ned Belavance, and her

1:03:44

let's do Devops sub stack that would

1:03:46

be pretty swell of you. If you

1:03:49

have comments or questions about this show,

1:03:51

send them to us via packet pushers.net/follow-up.

1:03:53

And until next week, just remember, too

1:03:55

much networking would never be enough.

Rate

Get this podcast via API

From The Podcast

Heavy Networking

Heavy Networking is an unabashedly nerdy dive into all things networking. Described by one listener as "verbal white papers," the weekly episodes feature network engineers, industry experts, and vendors sharing useful information to keep your professional knowledge sharp and your career growing. Hosts Ethan Banks & Drew Conry-Murray cut through the marketing spin to explore what works—and what doesn't—in networking today, while keeping an eye on what's ahead for the industry. On air since 2010, Heavy Networking is the flagship show of the Packet Pushers podcast network.

Join Podchaser to...

Rate podcasts and episodes
Follow podcasts and creators
Create podcast and episode lists
& much more

Episode Tags

Do you host or manage this podcast?
Claim and edit this page to your liking.

,

Unlock more with Podchaser Pro

Audience Insights

Contact Information

Demographics

Charts

Sponsor History

and More!

Pro Features

Resources
Help Center
Blog
API

Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More