Episode Transcript
Transcripts are displayed as originally observed. Some content, including advertisements may have changed.
Use Ctrl + F to search
0:00
Your network doesn't operate in a
0:02
vacuum. Every change you make has
0:04
a direct business impact. So why
0:06
make changes quietly in your silo?
0:08
Orchestrate your network automations
0:11
to integrate with the rest
0:13
of the business using ITential.
0:15
Visit ITential.com to find out
0:17
more. That's ITential.com. On
0:19
today's heavy networking, we will discuss
0:22
building a slack-bot wired to
0:24
an AI and trained on
0:26
your own organization's knowledge. The
0:28
potential use cases for network
0:30
operations are fascinating, and indeed,
0:32
we know of companies like
0:34
Selector.a that are training models
0:36
on real-time network infrastructure telemetry,
0:38
changing how we manage our networks. I'm
0:41
Ethan Banks, along with Drew, Conry Murray,
0:43
follow us on LinkedIn, Blue Sky, and
0:45
the Packet Pushes Community Slack Community. Our
0:47
guest is Kuyler Middleton. She's the co-host
0:49
of the Day to Devops podcast with
0:51
Ned Belavans. So if her voice sounds
0:54
familiar, that might be why. Kuyler's been
0:56
publishing a detailed, instructive series on her
0:58
Let's Do Devops sub-stack about her AI-enabled
1:00
Slack bot. And she definitely draws the
1:02
rest of the owl. And she was
1:04
guided enough to share even more time
1:07
with the community to record with the
1:09
community to record with us about what
1:11
she's built, lessons learned, lessons learned, and
1:13
suggestions for what the rest of what
1:15
the rest of us might be able
1:17
to create inspired by her project. So,
1:19
Kyler, welcome to Heavy Networking. And so
1:22
you're these articles that you've posted in
1:24
your in your sub stack. Some of
1:26
your sub stack is paid and some
1:28
of its not. What is the status of
1:30
these articles? Yeah, this is an ongoing series.
1:32
I sort of have these ideas of like,
1:35
could I build that? And then in a
1:37
week later in a, I come out of
1:39
a caffeine haze and I think, oh, okay,
1:41
I could, I've ignored my job for a
1:43
week, but I built it. So the. leading
1:45
maybe two or three articles are paid and
1:48
the rest of it's free. The status of
1:50
this series is 22,000 words so far and
1:52
the first two in the series of it's
1:54
going to be maybe eight because I want
1:56
to stream the tokens back to slack. We'll
1:58
get into that just. like Chad, I'm
2:00
building Chad's EBT basically, but for your
2:03
private enterprise, we'll be like Article 8
2:05
or something like that in this series.
2:07
Okay, I'd read the first four, you
2:10
just published part five, which gets into
2:12
some of the rag and augmentation stuff,
2:14
and there's gonna be three more after
2:16
that, at least you're saying, goodness. Yep,
2:19
absolutely. I want to do interstitial little
2:21
posts that said, hey, I'm talking to
2:23
the, I'm talking to the knowledge base.
2:26
Okay, now I'm chatting with AI, and
2:28
then start streaming it back to slack.
2:30
So it's not just a post, and
2:32
then 10 seconds later you get an
2:35
answer. So it's not that you drew
2:37
the owl, you drew the owl in
2:39
a tree in a forest with a
2:41
lovely lake nearby. Exactly. The owl has
2:44
an ecosystem that lives in. I really,
2:46
this is all intentional and it seems
2:48
ridiculous, but I want to make this
2:51
is all intentional and it seems ridiculous,
2:53
but I want to make this very
2:55
accessible. And the way that I can
2:57
make the most accessible is to include
3:00
the whole owl. And I want you
3:02
to be able to follow along. All
3:04
this code is published, like all this
3:07
code is published, like, But all the
3:09
code, it's MIT Open Source. If you
3:11
want to go steal it, if you
3:13
want to go sell it, that's fine.
3:16
It's MIT Open Source. Go do it.
3:18
It's on GitHub. But really, I would
3:20
love for you to implement this in
3:23
your own enterprise, because it's useful. And
3:25
it can do useful stuff. And I
3:27
don't want anyone to be excluded, because
3:29
they don't know what a lambda is,
3:32
or they don't know how to write
3:34
Python code. I can do that part.
3:36
I have provided. all that complex really
3:39
it's it's a lot of knowing the
3:41
right libraries knowing how to construct some
3:43
of the right statements but it's not
3:45
like thousands of lines of complex Python
3:48
not not at all it's under a
3:50
thousand lines of Python it's using some
3:52
external stuff like the Bata 3 library
3:54
from AWS but it's not even a
3:57
thousand lines of code so and you
3:59
can ask an AI to explain all
4:01
the steps so yeah okay so we
4:04
should zoom out let's pretend where the
4:06
AI And if you've got to explain
4:08
to someone at a high level what
4:10
you've built here, I mean, there's a
4:13
slack bot, it's tied to an AI
4:15
model. Tell us what this thing is.
4:17
Totally. The elevator pitch of the three
4:20
words of like, you know, the Facebook
4:22
for ice cream is, this is a
4:24
chat TPT for your private enterprise. So
4:26
I'm in a regulated industry, that's my
4:29
primary job is at Viridheim as a
4:31
software engineer, and we're in health care.
4:33
And we have, you know, a lot
4:36
of health care data. And that means
4:38
that we need to be really cautious
4:40
with what we let people upload and
4:42
where our data goes, because our CISO
4:45
could like go to prison if we
4:47
do a bad enough job of this.
4:49
Gen AI is really powerful, right? It
4:52
hallucinates and does nonsense things, but occasionally
4:54
is brilliant and it's really helpful for
4:56
writing code and doing, you know, all
4:58
sorts of stuff. Tell me a poem
5:01
about a pirate. Tell me about a
5:03
kitty. I've been giving my three-year-old stories
5:05
from Chad GPT the past few nights.
5:07
And, uh, It's so useful, but it's
5:10
so excluded for like regulated industries because
5:12
all your data is being collected and
5:14
trained on the Facebook model, the Google
5:17
model. It's free because you're the product.
5:19
And so these companies like, you just
5:21
shouldn't, you can't. you can, but you
5:23
shouldn't be using chatGPT or Google's public
5:26
models or Deep Seek, probably. So what
5:28
I wanted to do is bring that
5:30
power to industries that are excluded because
5:33
of their privacy, like governments and finance
5:35
and health care, and be able to
5:37
use it privately. So I'm using it
5:39
and I'm pitching it internally as like
5:42
you can have it analyze contracts You
5:44
can have it read resumes and give
5:46
you information and like you should never
5:49
do those things with public AI But
5:51
you can do it with private AI
5:53
safely and that's that's pretty cool. That's
5:55
the goal So give us some use
5:58
cases that might be interesting for infrastructure
6:00
engineers people that like for this audience
6:02
folks that manage network infrastructure. I have
6:05
been staggered at the amount of what
6:07
you would probably consider like expert level
6:09
expertise at writing Splunk queries. Like if
6:11
you don't understand what KQL is or
6:14
how to write a Splunk query in
6:16
it, you can have this tool do
6:18
it and it does an excellent job.
6:20
And This gets on to where I'm
6:23
building towards, but I've had it read
6:25
our entire confluence because that's supported by
6:27
the AWS bedrock data source toolkit. It's
6:30
beta, but it works. And we have
6:32
a bunch of guides on how to
6:34
write Splunk or how to write terraform
6:36
in our standards. And so now this
6:39
model spits out perfectly formatted Splunk queries
6:41
or terraform or ACL updates or tells
6:43
you exactly how to apply for an
6:46
exception to our manual change policy. with
6:48
our corporate standards immediately, and you don't
6:50
even have to talk to a human,
6:52
which some engineers really appreciate. I do
6:55
some days, too. So can you walk
6:57
through sort of the high-level big pieces
6:59
of the system that you've put together?
7:02
Yeah, absolutely. So this uses the Bolt
7:04
framework from Slack, which sounds scary, but
7:06
really it's just a little Python library
7:08
that you can use. So Slack is
7:11
the interface where I put my query
7:13
to start this whole mechanism running? Absolutely.
7:15
Yeah, let's start from there. So you
7:18
go into Slack and I have a,
7:20
excuse me, a Slack app that's registered
7:22
in Slack, and you can either direct
7:24
message it or tag it into a
7:27
shared... room and I figured that's the
7:29
best place to start. I could build
7:31
a web page or something, but everyone's
7:33
in Slack or everyone's in teams, which
7:36
I'm going to build in the future.
7:38
And when you message this bot, which
7:40
I call Vera, which is the Latin
7:43
word for truth, we'll see if AI
7:45
can stick to that, she'll do her
7:47
best. And it sends a web hookout
7:49
to a lambda function URL that spins
7:52
up a lambda that's written in Python.
7:54
It's in lambda, you know, just a
7:56
server, just a server. running somewhere is
7:59
patching is terrible and eventually your server
8:01
has to reboot and then it breaks
8:03
your thing and Lambda doesn't ever reboot.
8:05
And I love that. Lamb is a
8:08
serverless service from AWS, yes. Yeah, absolutely.
8:10
Okay, so that means you're not running
8:12
infrastructure to support this because you're using
8:15
tools. At all. It's Python 312 which
8:17
is supported until 2028. There's no underlying
8:19
operating system that I have to patch
8:21
or reboot or monitor or anything. It
8:24
just spins up and processes a conversation
8:26
and then spins down. There's also the
8:28
side benefit that it can scale out
8:31
almost indefinitely. So if I want to
8:33
have 10,000 conversations at once, I could,
8:35
I'm never going to get there with
8:37
this product, but it's possible. And the
8:40
bill would probably be staggering. It would
8:42
be. Well, it's kind of surprising because
8:44
I have numbers for the costing and
8:46
it's almost nothing. I've processed about 2,000
8:49
conversations so far and it's cost about
8:51
12 bucks. So it's really. Comparatively, let's
8:53
talk about that later because it's I
8:56
have so much about that. So the
8:58
lambda gets the conversation, it reads the
9:00
entire black thread using the. Bolt's API
9:02
endpoints for slack and constructs the conversation
9:05
that it sends over to the bedrock
9:07
APIs which bedrock is an AWS AI
9:09
endpoint system We're using all of the
9:12
it's called serverless on their side, which
9:14
means you don't have to have an
9:16
AI model provisioned Starting cost $30,000 which
9:18
I'm not quite ready for for this,
9:21
you know, homegrown lab thing and There's
9:23
a little bit more safety and security
9:25
built in on the bedrock side, but
9:28
I'll skip all that stuff and Bedrock
9:30
lets you pick which models are going
9:32
to be sort of the base of
9:34
this. Yeah, absolutely. I'm using Anthropics, Claude
9:37
Sonnet, 3, but you can pick whatever
9:39
you would like. Whatever is in Bedrock,
9:41
that is, but they've got a huge
9:44
selection. Exactly. Yeah, you can import your
9:46
own, but again, when you import a
9:48
model or train a model, they run
9:50
it for you and base cost around
9:53
30 grand a round 30 grand a
9:55
month. So unless your product is built
9:57
around this, it's just out of reach
9:59
for everyone. So as long as it's
10:02
available in their serverless library, which is
10:04
a ton of stuff you've heard of,
10:06
the open AI models are available, clouds
10:09
models, Google's Gemini models are all there,
10:11
and they're charged based on tokens, and
10:13
it's something like a million tokens for
10:15
five bucks. And most of these conversations
10:18
use about 500 tokens. So the math
10:20
of that is staggering. It's almost nothing.
10:22
Especially compared with... enterprise products that serve
10:25
this where they're seated and you have
10:27
to pay for, you have 50 users
10:29
so you have to pay $10 a
10:31
month for each user. That's $500 a
10:34
month and this is going to serve
10:36
that same need for like maybe the
10:38
cost of two or three Starbucks a
10:41
month. It's really a huge difference. The
10:43
Slackbot itself, I had built a Slackbot
10:45
before Slack retooled how you build an
10:47
application and Slack, this goes back a
10:50
few years, but you had to have
10:52
some code that was basically sitting there
10:54
listening there listening there listening there listening.
10:57
do the slack channel and reacting. Do
10:59
you, is there still a piece like
11:01
that or is the new way you
11:03
do slack bots these days? Do you,
11:06
do slack kind of do that for
11:08
you or? That's how I develop it
11:10
locally is I run the Python thing
11:12
and it starts the slack listener that
11:15
I connect to via like an in-grock
11:17
endpoint, which is an open source tool
11:19
that lets you receive public web hooks
11:22
and send them to your local listener.
11:24
But for the real production one that's
11:26
running in Lambda, it receives, it's called
11:28
a function URL, which means if it
11:31
receives a connection, like that's the listener
11:33
as the Lambda infrastructure from AWS, it
11:35
spins up your Lambda in about a
11:38
quarter of a second and starts processing
11:40
the event, like right away. So it's
11:42
not sitting there charging you money, it
11:44
just is ready to run your Lambda.
11:47
And the trigger is to spin that
11:49
up, when an input is received from
11:51
the slack command line and goes into
11:54
slack, there's a web hook that fires,
11:56
reaches into Lambda, because I've read this
11:58
part of your post, then the Lambda
12:00
instance spins up and begins process. Yeah,
12:03
and it uses I am permission. So
12:05
there's almost no static keys. I static
12:07
keys are the worst. We know that
12:10
as infrastructure engineers. So it's all I
12:12
am dynamic stuff. Everything on the AWS
12:14
side is keyless authentication. They're the worst
12:16
except for their incredible convenience. Oh, they're
12:19
so convenient. And they don't require fetching
12:21
anything. But I am such a bear
12:23
to learn at first. But now that
12:25
I've got it. I'm starting to like
12:28
it only took 10 years. I'm starting
12:30
to like it. Okay, bedrock. We've talked
12:32
about this a lot and bedrock is
12:35
I understand it here. Correct me if
12:37
I'm wrong. That's the service that provides
12:39
the AI model. And AWS is notorious
12:41
for having many services. So what's. bedrock
12:44
specifically and why they just selected over
12:46
any other AI related options that AWS
12:48
might offer. Well, I actually kind of
12:51
didn't. I started building this with Lambda
12:53
because I built a project with that
12:55
previously and I like it. And I
12:57
was going to send everything over to
13:00
Azure AI because I don't know. That's
13:02
what my company's kind of standardized on.
13:04
And I really like how they've done
13:07
their AI models. It's the same as
13:09
bedrock, but it's hosted in the Azure
13:11
cloud from Microsoft. And then one of
13:13
our architect said, well, well, Send all
13:16
the traffic to AWS so you can
13:18
send all the traffic to Azure. That
13:20
doesn't make any sense. And I thought
13:23
I can either update this to the
13:25
bedrock AI endpoints from AWS or I
13:27
can rewrite this lambda as a function
13:29
URL and serverless is so different between
13:32
clouds. And so is authentication and so
13:34
is all the standards they use for
13:36
just how things run at work. And
13:38
I thought I would much rather learn
13:41
bedrock than learn. how function URLs actually
13:43
work in Azure. So I said, we're
13:45
going to AWS. So is bedrock the
13:48
only AI related service that I would
13:50
be considering? There's so many services in
13:52
the U.S. I don't keep up. Is
13:54
bedrock yet? There's so many services and
13:57
I don't understand all of them. There
13:59
are some other machine learning services that
14:01
hand service needs like this, but for
14:04
smaller projects like this, where you're processing
14:06
an input, especially this conversational gen AI
14:08
type AI, bedrock is probably where you're
14:10
working at. They're starting to put all
14:13
of their serverless models and their simple
14:15
guardrails that sort of monitors for like.
14:17
inappropriate content in and out of models.
14:20
That's all in bedrock. So yeah, that's
14:22
probably where you're starting. The examples are
14:24
really good. Some of them are hidden
14:26
away in GitHub example repos that are
14:29
a little hard to find, which means
14:31
you can read from Kyler's blog, let's
14:33
do devops, and how to do it.
14:36
And like that has all the pictures
14:38
and stuff. This changes so much, especially
14:40
behind the scenes, but some of the
14:42
like front end changes too that it's
14:45
hard to follow along AWS docs in
14:47
any real way. and have them make
14:49
sense because it's changing, you know, like
14:51
the whole field is changing. It's not
14:54
AWS's fault. This is just, it's a
14:56
moving target for technology. So I think
14:58
I heard you say there are, you
15:01
could also access publicly available available models
15:03
in Azure, but you didn't want to
15:05
learn Azure serverless functions, so you just
15:07
stuck with Lambda. So does that mean
15:10
if I'm in Azure or I'm in
15:12
GCP, they also have similar services to
15:14
bedrock? Yeah, absolutely. I haven't done a
15:17
lot with GCP, but Azure definitely does.
15:19
I like their implementation a little bit
15:21
more than bedrock. In the AWS side,
15:23
if you want to use a guardrail,
15:26
which is sort of monitoring for inappropriate
15:28
input and output, you specify it in
15:30
your code, strangely. So like, if someone
15:33
wanted to bypass it, they could comment
15:35
it out. And then you no longer
15:37
have any guard rails. And that's a
15:39
strange choice. Azure you deploy a model
15:42
to an end point and when you
15:44
do so you specify a guardrail and
15:46
it's just implicitly invoked whenever you talk
15:49
to the model so there's no you
15:51
know bull flag that you pass them
15:53
that says hey don't check this for
15:55
safety this time you just have to
15:58
use it which I've I much prefer
16:00
that standard. Oh boy, I bet we're
16:02
going to be hearing about that someday.
16:04
Yep, someone will have forgotten it
16:07
or their code will like turn it off
16:09
on accident and then it just goes
16:11
nuts because they do. And I've actually
16:13
had the AI go crazy a couple
16:15
of times with some of my bad
16:17
code, which is really fun to watch
16:19
your software project go absolutely loony. It's
16:21
been a ton of fun. But the
16:23
general rule, if you are paying for
16:25
the AI per like... tokens, especially
16:27
these cloud platforms, your privacy
16:29
will be respected. And I've
16:31
seen, I have read good things
16:34
from GCP and Azure and AWS
16:36
for their AI services. But if
16:38
you're paying something like an open
16:40
AI, like these other sort of
16:42
public AI platforms, they don't have
16:44
a proven track record where they're
16:46
respecting your privacy. So especially for
16:48
like regulated industries in finance and
16:50
healthcare, I would be very cautious
16:52
using those. But for these hyperscaler
16:54
platforms. I feel much more comfortable.
16:56
They have access to so much
16:58
of our data already in S3
17:00
buckets and databases and servers, but it
17:03
just doesn't make a dent. And the risk
17:05
that we're exposing ourselves to, to, you know,
17:07
have our AI or two. We're already screwed,
17:09
so you might as well keep going. Kind
17:12
of, yeah, all our eggs are in the
17:14
basket. Let's put another egg on top. Side
17:16
note here, I'm curious, did you... ask anybody
17:19
at work in a regulatory
17:21
position or compliance position about
17:23
this project before you started
17:26
or you're just going for
17:28
it? I did eventually ask for
17:30
permission, but I wanted to see
17:32
if it would work first. And
17:34
that's really a bad standard. But
17:36
I started with data that we
17:38
don't really care. if people read. So
17:41
our confluence is like the wiki from
17:43
Atlacian. And that data, that has historically
17:45
just been open to everyone. Everyone can
17:47
read, everyone can write for almost everything.
17:49
So I thought, if this just goes
17:51
crazy and it starts to spit out random
17:53
facts to people, it doesn't really matter
17:55
because everyone has access to this data
17:58
already, it's just provided in slash. instead
18:00
of a web browser. What I want to get this
18:02
to eventually is the sort of transitive
18:04
security model where if, you know, Ethan's
18:07
permissions are different than Drew's permissions, when
18:09
you talk to the model that AI
18:11
can access a different tier of data
18:13
or different sources of data and your
18:16
sort of permission schema gets transitively applied
18:18
to what the model can do. That's
18:20
well beyond what. models can do, like
18:22
state of the art wise, I might be
18:24
able to kind of hack it together with
18:27
different knowledge bases and data sources and sort
18:29
of turn them on or off for different
18:31
people, but that's really a hacky solution to
18:33
this problem that would be better served with
18:35
something much more elegant. So that's... And I
18:37
can just ask the AI to pretend I'm
18:39
Ethan and then give me Ethan's...
18:42
Absolutely, absolutely, yeah. Aren't
18:44
you devious? Wow, right there.
18:46
Thinking two steps ahead, man.
18:48
A quick sponsor message from
18:51
the network orchestration folks
18:53
at ITential. Automating
18:55
network configuration change is
18:57
a major milestone for a netops team.
18:59
Well, what is next? Orchestrating the entire
19:02
workflow. Because network changes don't begin when
19:04
you kick off a script. Network changes
19:06
begin with a business process, such as
19:09
a ticket coming in, and then you
19:11
have a change to meet the needs
19:13
of that ticket proposed. Testing is
19:15
then performed to make sure the change
19:18
will do the right things, and then
19:20
a human approves, and the changes performed,
19:22
and post-deployment testing is done, and notifications
19:24
are sent out, and the ticket is
19:26
updated, and your process no doubt varies,
19:28
right? But you get the idea. Now, what
19:30
if you could take that entire
19:33
workflow and orchestrate it so that
19:35
your manual interaction with the ticketing
19:38
system and with service now and
19:40
so on, if that's all handled
19:42
by an automated workflow that you
19:45
designed to work with the specific needs
19:47
of your shop, that would make
19:49
you more efficient, right? It would
19:51
increase the likelihood of not only
19:54
the change getting done, but the
19:56
entire business workflow being
19:58
completed without error. You
20:00
get a complete set of tooling that
20:02
gives you all the power you could
20:04
need to have your company's IT and
20:07
business platforms interacting smoothly. When your
20:09
network operations are ready to evolve
20:11
to robust orchestration of your network
20:13
changes, ITential should be on your
20:16
short list of platforms to evaluate.
20:18
To find out more about ITential's
20:20
products and how they help
20:22
you orchestrate network automation workflows,
20:25
visit ITential.com. That's itential.com and
20:27
tell them packet pushers sent
20:29
you. Kylie, you mentioned Python
20:31
3.12. Is that critical for the
20:33
functioning of this? Or could I
20:36
get away with something older? I
20:38
mean, lots of default Python installations
20:40
for OSAs are somewhat older than
20:42
312. You could totally do something older.
20:45
I picked 312 because it was the
20:47
most recently supported Python 3 for
20:49
Lambda in AWS that I would
20:51
not have to patch. Because just...
20:53
Engineers are really lazy and I
20:56
fit this model very very well. I
20:58
don't want to do this again. And
21:00
so by 2028, you know, maybe I'll
21:02
have another job. Maybe I'll have moved
21:04
on to a different thing. Someone
21:06
else will solve that problem. I've
21:09
put it so far out in the future.
21:11
And that's the goal. So use as
21:13
new as you can because then you, you
21:15
know, it's probably someone else's
21:17
problem when it breaks in a couple years
21:20
when it goes into life. in the long
21:22
run, does it matter? Or what kind of
21:24
things should I be thinking about when I'm
21:27
going through with a vast menu of
21:29
models to pick from? There's so
21:31
much to this answer, but let's start with
21:33
just testing models. So this is built,
21:35
but the one that I built is
21:37
built on Anthropic Clyde 3, Sonnet, which
21:40
is the latest model from the Anthropic
21:42
company, which is one of those sort
21:44
of big companies building AI models. And.
21:46
you can absolutely test these models on
21:49
the bedrock platform. So within the console,
21:51
you can do side by sides with
21:53
I think up to three different models and
21:55
ask them the same questions or ask the
21:57
same model with different parameters. like
22:00
their temperature and their top pee
22:02
of, you know, generate an answer
22:04
to this question. And you can
22:06
sort of measure how they do.
22:08
And so that's what I did first
22:11
is like do some big models. I
22:13
did open AIs, I think,
22:15
O3 and Anthropic Claude and
22:17
Gemini and Titan from AWS
22:19
and just. see how they do.
22:21
And Anthropic was quite a bit
22:24
better at understanding programming, which I
22:26
built this first of all to
22:28
be like a programming assistant for
22:30
our software engineers and our SRE
22:32
team. And so that was kind
22:34
of an easy choice for us
22:36
to do. But we're using this
22:39
particular API from AWS called the
22:41
Converse. API and that's a fancy
22:43
word for it's sort of a meta
22:45
API where it has a standard interface
22:47
no matter what model you use because
22:49
all these models they're built a little
22:51
different their APIs are different for how
22:53
they expect data and the formatting
22:56
of documents etc so the converse API
22:58
standardized is that it's one API call and
23:00
it can talk to any model on the
23:03
back end. They sort of reformat your
23:05
API call and pass it to the
23:07
model, which is really cool in terms
23:09
of like it has some support for
23:11
document types like you can pass it
23:13
a spreadsheet and it'll understand it where
23:15
the models might not. But the side benefit
23:17
of that is I can. flip over
23:19
to a different model in about five
23:21
minutes. I don't have to reformat how
23:23
I'm constructing all those API calls. You just
23:25
specify a different name of model and converse
23:27
will convert it for that and send it
23:29
over. So big fan of that, it has
23:32
worked really well. And if we ever decide
23:34
that, you know, that four-o from open AIs
23:36
looking really cool, we can probably just
23:38
test it out by changing to a
23:40
different name when you're declaring which model
23:42
you're declaring which model you want
23:44
to talk to talk to. accuracy of response,
23:47
speed of response, were there other things you
23:49
were looking for? No, that's that's kind of
23:51
it and that's all very gut feelingy.
23:53
It's very unscientific at this point because
23:56
this is very much a lab project I
23:58
just built by myself for real in
24:00
AI engineers that are building stuff
24:02
that handles like health care
24:04
data and other like user
24:06
facing stuff, there's testing suites
24:08
where you pass in, you know, 500
24:11
different tests and you. analyze the
24:13
responses generally with another AI model,
24:15
which is kind of funny, AI
24:17
judging other AI responses, and you
24:19
score them and you can tell
24:22
in this really scientific methodology whether
24:24
it's better or worse to go
24:26
to a different model and like
24:28
how does it handle typical questions
24:30
that we have. But this is kilerware
24:33
of I just do it and I'm
24:35
like, oh yeah, that seems like a
24:37
better answer to me. Let's use that
24:39
one. In the future, we'll probably do
24:41
both automated measuring of sort of a
24:43
formal methodology of what's better, worse
24:46
for different models and standards and
24:48
parameters, but also something called data
24:50
grounding, where you can give the
24:52
correct answers to binary questions. So
24:54
like, what color is a stop
24:57
sign? It's red and white. And
24:59
so you can have it measure
25:01
whether that answer is accurate. And
25:03
you can provide it like hundreds
25:06
or thousands of questions where it
25:08
has to get the answers right.
25:10
And those responses can be measured in
25:12
real time. That's a new thing in
25:14
bedrock. I don't have that turned on
25:17
yet, but I want to. I just
25:19
need to write some binary that those
25:21
sort of have a real answer type
25:23
questions, not a gut feeling
25:26
style. And it'll be able to measure
25:28
those. responses from the model, whether they're
25:30
factually correct. So it's a different AI
25:32
that spends up in real time and
25:34
measures the response back to the user
25:36
and says, like, oh, this is accurate
25:38
enough. It passes my threshold of, I'm
25:40
going to let it go back, versus
25:42
this is total nonsense. This disagrees with
25:44
the things I know are true. I'm
25:46
going to block it and send an
25:48
error message instead. That's much more useful
25:50
for user facing stuff that's thousands of
25:52
responses a day, but I'm learning how
25:54
it works so I can do that
25:56
cool stuff one day. Yeah, so you're saying there
25:58
are like rigorous methods. for testing, but
26:01
this is a lab project, so vibes
26:03
suffice. It's vibes, exactly. The vibes are
26:05
good, so we're building the thing. And
26:07
I'm kind of just bolting these on.
26:09
This is definitely one of
26:11
those projects where it's resume-driven
26:14
development. I wanted to just
26:16
learn how it worked, and I came up
26:18
with an excuse. And so far, that's working
26:20
great. I'm not a PhD, I'm not
26:22
a math whiz, but I'm an ops kid
26:24
that likes to play with software.
26:26
And so far, that's good enough. You
26:28
said that the model you selected
26:30
was better with programming responses specifically.
26:33
Was that a cedar pants vibe
26:35
kind of thing? Like, I don't
26:37
really like the answer I got
26:39
from this other model, but this
26:41
one, yeah, Antropics really doing it right.
26:43
Yeah, I measured the Antropic Claude Sonnet
26:45
versus AWS's Titan model. I think
26:48
that's the name of their model,
26:50
their newest sort of general AI
26:52
text. processing model. And I asked
26:54
it specific SRE type question, software
26:57
engineer questions, and the AWS model
26:59
said, you know, you should probably talk
27:01
to a software engineer. And I'm like,
27:03
no, I know I can talk to
27:05
a software engineer. I'm talking to you.
27:08
Give me your best answer, particularly about
27:10
questions of like, how do these
27:12
AWS services work, which I feel like AWS's
27:14
AI should probably be trained on that a
27:16
little more. They should know how the AWS
27:18
cloud works. I'm just saying. So yeah,
27:20
I just tested a couple of different
27:23
software as different programming questions, sort
27:25
of like an interview. It's kind
27:27
of like I'm interviewing them for
27:30
a job, which is a really
27:32
apt analogy here. I noticed in
27:34
one of your posts that there is some
27:36
model tuning you could do to get the
27:38
sort of answer that you're looking for. Like
27:40
you wanted it to give you kind of
27:42
an engineering friendly answer with, with not too
27:45
many hallucinations, but also not too restricted. Because
27:47
if I remember right, the way you wrote
27:49
the post, if you can tune it in
27:51
such a way that you can hardly get
27:53
anything useful out of it, but if you
27:55
let it go crazy, you'll get a lot
27:57
of bogus data. How do you do that tuning?
27:59
It's so fascinating, it's so little like
28:02
programming and so much like talking to
28:04
maybe like a junior style engineer that's
28:06
not confident in themselves, because if you
28:08
are opinionated enough, it will agree with
28:11
you, no matter what. And I am
28:13
a confident person, and I've had this
28:15
trouble with junior engineers before, where I
28:18
say something so confidently and so wrong,
28:20
and they'll agree with me, because like,
28:22
you're so confident, you must know what
28:24
you're talking about, and that's not true.
28:27
I just come across that way. you
28:29
are able to set a couple of
28:31
parameters for most of these models and
28:33
the parameters sometimes differ but the big
28:36
ones that people should know about are
28:38
temperature and top P and temperature is
28:40
from zero to one and it's the
28:42
amount of creativity sort of freewheeling that
28:45
you permit the model to do and
28:47
you can sort of turn the creativity
28:49
all the way up to one. Oh
28:51
and it will just make stuff up.
28:54
Which like we've all met. engineers that
28:56
do that? Maybe me too. And... What
28:58
qualified make stuff up? I mean, it's
29:01
not going to be purely a random
29:03
answer. It's still an LLM. It's still
29:05
following some kind of context or, you
29:07
know, language chain. And to give you
29:10
words that in theory should be plausible,
29:12
it's so it's not just making things
29:14
up, right? I think it's the amount
29:16
of reward that the model is given
29:19
for agreeing with you. and for telling
29:21
you positive answers. So does the moon
29:23
go around the sun? And if the
29:25
temperatures wonder, oh, it'll say, of course
29:28
it does, and it'll explain how Galileo
29:30
proved that the sun goes, the moon
29:32
goes around the sun. And like, that's
29:34
not true, but the model's reward for
29:37
saying yes is high. So it'll do
29:39
it. It'll be rewarded for just lying
29:41
to your face. So what we've done
29:44
for this, this is supposed to be
29:46
a model that doesn't lie. It grounds
29:48
its information based on what's real and
29:50
not just what I want to hear,
29:53
which is... very much preferable in an
29:55
engineering context, is turn the temperature way,
29:57
way down. I'm currently at point one.
29:59
I could probably go smaller. I think
30:02
I can go hundreds of places and
30:04
not just tens. But the problem is
30:06
that when you get really low, it
30:08
stops being able to kind of make
30:11
this sort of inference style reasoning, where.
30:13
If it knows that a marble falls
30:15
to the earth because of gravity and
30:17
its temperature is zero, and you say,
30:20
does a basketball fall to the earth
30:22
because of gravity? And it'll say, no,
30:24
I don't have information to back that
30:26
up. I can't make a deduction or
30:29
an inference. I know marbles do, but
30:31
I can't infer that anything else is
30:33
also subject to gravity. Right. Exactly. Which
30:36
is. unreasonably grounded in reality. And so
30:38
really you want it to be able
30:40
to make some references. If you can
30:42
write a loop in Python, you can
30:45
probably write a loop in Bash, and
30:47
I'll go find out how. So you
30:49
want the temperature to be a little
30:51
bit high, a little bit up. Again,
30:54
I started with like point three, and
30:56
I'm trying to get it to point
30:58
one. It still makes stuff up sometimes.
31:00
A, I just do at this point
31:03
of the state of the state of
31:05
the art. But you can also set
31:07
the top P, which is the number
31:09
of tokens it'll consider. for the next
31:12
choice. So like how randomly it chooses
31:14
the next token. So if Top P
31:16
is like 25 words, it's considering 25
31:19
tokens using its temperature algorithm. And I
31:21
hope this is all accurate. If there's
31:23
AI folks out there that got it
31:25
wrong, I'm doing my best. But it's
31:28
been working so far. So yeah, that's
31:30
where we're at. So
31:32
we talked about a little bit about
31:35
guardrails, but I'd like to dig into
31:37
that a little bit more. Again, guardrails
31:39
are essentially like controls on what the
31:41
model will respond to based on prompts.
31:44
And can you talk about, you know,
31:46
what kind of guardrails are available and
31:48
what you were interested in? Yeah, absolutely.
31:50
So those exist. I imagine in most
31:53
of these hyperscaler platforms, but specifically, Azure
31:55
and AWS have the concept of guard
31:57
rails. or model blocking, I think it
32:00
might be called in Azure, where you
32:02
can give it specific things that it
32:04
shouldn't talk about. Categories like profanity, like
32:06
if anyone curses at the AI, you
32:09
can block it on the input or
32:11
block it on the output. You don't
32:13
want the model cursing at people or
32:15
nudity or violence, like don't explain how
32:18
to make C4, please. Like maybe you've
32:20
been trained on that data. Please don't
32:22
explain that in the context of my
32:24
business app. Something that our legal team
32:27
in particular asked me to do was
32:29
make sure that it won't give financial
32:31
advice. Because it sort of seems like
32:34
it's speaking for the company, right? If
32:36
you have any kind of AI, like
32:38
this, that was that famous story in
32:40
Canada where there was a car dealership
32:43
that had an AI. And it was
32:45
a car dealership that had an AI
32:47
channel of support and it promised it
32:49
would give them a car for $10.
32:52
And they were sued and I can't
32:54
remember how that worked out. should I
32:56
buy your stock? Is it going to
32:58
go up next week? It might actually
33:01
have information that is accurate on that
33:03
question and it's also highly illegal for
33:05
it to give that information to anyone.
33:08
So we cannot do that. So something
33:10
cool that you can do on the
33:12
AWS guardrail side is you can give
33:14
it example questions. This isn't a category.
33:17
Financial advice is not a category like
33:19
profanity and nudity and violence, but you
33:21
can get examples of questions that it
33:23
should not answer. And responses that it
33:26
should give instead. So we wrote a
33:28
couple of questions like that for financial
33:30
advice for stock investment for the future
33:32
of the company in terms of growth
33:35
or sales and said that I'm sorry
33:37
I'm not authorized to speak on behalf
33:39
of this company. So just sort of
33:42
catch all responses that say like I'm
33:44
not going to actually give you this
33:46
answer. And it's interesting because it's it's
33:48
not trained into the model. It's not
33:51
part of the model. It's. a guardrail
33:53
that just processes every in and out
33:55
using AI. Like it's using generative AI
33:57
as a totally. separate process as a
34:00
layer to measure your question in and
34:02
your response out to see whether they
34:04
fit your parameters of what you permit.
34:06
So my assumption around guardrails is it
34:09
was sort of like you know when I
34:11
get web filtering services from a security company
34:13
I can check all the boxes no hate
34:16
speech no gambling no whatever and I don't
34:18
have to go out and find all of
34:20
the URLs associated with that they're doing it
34:22
for me. I assumed it was the same
34:25
with guardrails is that the case or it
34:27
sounds like I can also program in very
34:29
specific rules. Yeah, that's exactly true is
34:31
what you said right at the end. It
34:34
both does these categories that you
34:36
don't have to train it on all the
34:38
words that qualify as profanity. You can check
34:40
the box and set it. I think it's
34:43
like low medium high. It would be an
34:45
interesting day to program that in. I would
34:47
do it. I think it'd be fun. And
34:49
you can also give it these sort of
34:52
AI generative questions and answers
34:54
that it should be providing. So
34:56
it's sort of going beyond just
34:58
what it supports to block traffic.
35:00
So it sort of works like
35:02
a WAF in the sense that
35:04
it's finding specific things, but it's also
35:07
finding similar things that qualify.
35:09
So if it's detecting that it
35:11
seems like profanity, it will be
35:13
blocked by the profanity filter, which
35:15
is pretty cool. It occasionally is
35:18
a little overzealous. some of our finance
35:20
team wants to talk about like, how do
35:22
I find a credit card number that I
35:24
can use to check out, you know, in
35:26
our demo environment? And then it's saying,
35:28
I'm not going to give you
35:30
a credit card number, obviously. And
35:33
so we've had some edge cases
35:35
where we have to kind of
35:37
tweak it for, you know, the
35:39
bizarre things that developers have to
35:41
do to make apps actually work.
35:43
Well, you need to make exceptions
35:45
for Australians in the case of
35:47
profanity because what most of us
35:49
would consider profanity is just everyday
35:51
speech for the average Australian. Absolutely.
35:53
Maybe there's an Australian mode. I
35:55
haven't seen it yet, but AWS,
35:57
please build that. Context, conversation, context.
36:00
That's really important for that
36:02
human-like experience when chatting with
36:04
the bots. So how do we get
36:06
context? Well, that's an interesting problem
36:08
here, because you're building a conversation, which
36:11
is a series of person A, or
36:13
a user, speaking to person B, or
36:15
the system. And you can have lots
36:17
of conversation turns, but that's what you're
36:19
providing to bedrock and saying, you know,
36:22
here's the whole conversation that's previously
36:24
happened. Please generate a response
36:26
using this context. And at
36:28
first I built this to just
36:30
read everything in a direct message
36:33
thread like all of the conversation
36:35
you've had, which can be hundreds
36:37
of turns on all sorts of
36:39
topics and. The AI went crazy because it
36:42
got really confused. First of all,
36:44
there's just too much context for
36:46
its process in a reasonable amount of
36:48
time. But also if you're asking about
36:50
topic A and then topic B and
36:52
then topic C, those are kind of
36:55
related, but passing all that information
36:57
at once to someone, you would
36:59
confuse any human with so much
37:01
context immediately and it confused the
37:03
AI right away. So I decided
37:05
to kind of found a conversation.
37:08
context, in the same way
37:10
that sort of slack does
37:12
natively, which is called threads.
37:15
Threads are sort of these
37:17
child objects in direct message.
37:19
And so it's not like
37:22
a parent level message message message.
37:24
It's a child a child a
37:26
child beneath a message. And so
37:29
I just read the entire context
37:31
of the thread. And we also
37:33
look up all of the user
37:35
information. So find your real name
37:37
Drew Conroy Murray and your pronouns
37:40
if you set them in slack.
37:42
So it can speak more naturally.
37:44
It was using they then for
37:47
everyone which was bizarre. And It's
37:49
able to, because it's reading that whole
37:51
thread and passing it forward to
37:53
bedrock, it's able to understand who's
37:55
speaking. So if Ethan and Drew
37:57
are arguing about something, it's able.
37:59
to understand who has opinions about
38:01
what. And it can kind of
38:03
help settle arguments or summarize. the positions
38:06
of the different people on the thread
38:08
and who agrees with who and who
38:10
thinks blah blah blah. But that's something
38:13
I didn't expect people to do. They
38:15
immediately started using it to summarize these
38:17
really long slack threads of these
38:19
two experts arguing for 50 conversations
38:21
and then you come in at the
38:24
bottom and you're like, you know, it's
38:26
that meme where you walk into the
38:28
room with pizza and everything's on
38:30
fire and you're like, what happened
38:32
here? And so you can ask the
38:35
AI like, please read this whole thread
38:37
in 50 words or less. what they're
38:39
talking about. And it can do
38:41
that because it's reading thread and
38:43
getting all the context of who's speaking
38:46
and what they've said, including any documents
38:48
that are attached, documents as a whole
38:51
other challenging ball of wax, but
38:53
primarily that's how context is working.
38:55
That's all something that I just kind
38:57
of made up, that it makes sense
38:59
to me in Slack threads being a
39:02
conversation boundary. So let's just use
39:04
that as a conversation for bedrock.
39:06
And so but that's me using the
39:08
slack bot you know I'm interfacing with
39:10
slack I need to know each conversation
39:13
I have with this slack bot
39:15
needs to be threaded in order
39:17
to have context is that true? Yeah
39:19
so you'll either tag it in a
39:21
thread or in a parent message and
39:24
it will respond in a thread
39:26
so it sort of guides you
39:28
towards using this model you don't have
39:30
to memorize that and that's the sort
39:32
of context that's passed in in
39:34
real time and it's all built
39:36
so it doesn't. keep it. But we
39:39
also have training data, the knowledge base
39:41
that it can look up. And so
39:43
we use this first phase of
39:45
the conversation where you read the
39:47
entire thread, which can be really long,
39:50
right? People get verbose in slack, or
39:52
at least I'm very chatty in slack.
39:54
And it'll look at our knowledge
39:56
base, which is all the data
39:58
we have trained it on. It's a
40:01
vector database called Open Search in AWS.
40:03
is like a vector database platform. And
40:05
that's where all of our knowledge
40:07
is stored. And it sort of
40:09
finds related conversational vectors. So like related
40:12
to the topics you're talking on. the
40:14
information you've trained it on. And we
40:16
pass that information as additional context.
40:18
And I'm just doing additional conversation
40:20
turns that say like, hey, this is
40:23
a knowledge-based entry, please use this. And
40:25
then phase two is you actually talk
40:27
to the model with that assembled
40:29
thread of the user's request, the
40:31
thread that it's in, the conversation knowledge-based
40:34
information that we've retrieved, and that whole
40:36
package is given to the AI to
40:38
say like, hey, please make sense
40:40
of this and give us a
40:42
response. A quick observation
40:45
here in your notes you mentioned
40:47
you were running bedrock in U.S.
40:49
East 1, A.W.S. U.S. East 1,
40:51
but it was kind of broke
40:53
and you ended up using U.S.
40:55
West 2 and it's been working
40:57
great ever since. What was your
40:59
experience with it being broke? Were
41:01
you just getting errors or strange
41:03
responses? It was just giving me
41:05
errors that my things were malformed.
41:07
My API requests were malformed despite
41:09
them exactly matching the doc. You
41:11
try to troubleshoot your code, like
41:13
maybe I've done it wrong. I've
41:15
done it wrong many times before,
41:17
but this exactly matches the document
41:19
example. And I talked to our
41:22
Tam and some friends, and they
41:24
said, well, you know, East One
41:26
breaks sometimes with bedrock. East One
41:28
is an overloaded region. There's a
41:30
lot going on there. But West
41:32
Two is, it gets the new
41:34
stuff first. because I don't know
41:36
why. So try that. And I
41:38
flipped over to that region and
41:40
it worked right away with no
41:42
code changes. I just pointed out
41:44
a new place. So since then
41:46
I've left it. So my lambda
41:48
runs in East One and it
41:50
uses service in West Two. That's
41:52
called cross region inference, but it
41:54
works fine and it's free. So
41:56
I just kind of left it
41:58
there. It's a little awkward jumping
42:01
around the console regions to read
42:03
the logs for different services, but
42:05
it's not annoying enough for me
42:07
to fix it. word stays. Speaking
42:09
of free, you mentioned earlier that
42:11
overall this project's been pretty inexpensive,
42:13
but I am scared the death
42:15
of running up my AWS cost.
42:17
Can I, is there a way
42:19
I can guard against costs getting
42:21
out of control if I'm using
42:23
bedrock or lambda? Yeah, absolutely. You
42:25
can write warnings in your cost
42:27
explorer that trigger and will email
42:29
you if you're beyond like $10
42:31
a month or $20 a month,
42:33
or if your projection is higher
42:35
than that. But generally in Bedrock,
42:37
it's so inexpensive, I would recommend
42:40
you could try it. And Lambda
42:42
similarly costs almost nothing. I think
42:44
Lambda costs like a dollar a
42:46
month for 400 requests. The real
42:48
cost is the knowledge bases. You
42:50
have to be very careful with
42:52
that. I trained it on about
42:54
40 gigabytes of confluence data. And
42:56
you would expect storing 40 gigabytes
42:58
in a database would cost, you
43:00
know, maybe $100 a month. I
43:02
don't know. That's a napkin math.
43:04
It's around $1,200 a month. So
43:06
it's... it's like $25,000 a year
43:08
or something like that is what
43:10
it initially cost. I've been fiddling
43:12
with it to get the math
43:14
down and we're still around like
43:16
14 grand a year to store
43:18
40 gigabytes of data in a
43:21
database. So it's that is significant
43:23
for especially for an internal tool
43:25
that's not generating revenue. I'm just
43:27
in cost center building stuff. And
43:29
that's so be very wary of
43:31
knowledge bases because they're very expensive,
43:33
but the lambda and the bedrock
43:35
so far have cost. almost nothing
43:37
a couple of Starbucks a month
43:39
and you're good to go for
43:41
an A. I bought in your
43:43
series as you've been writing about
43:45
this you made a big deal
43:47
about Lambda being all about you
43:49
don't want to have to think
43:51
about infrastructure ever and so and
43:53
so Lambda but let's say I'm
43:55
okay with with managing some infrastructure
43:57
I've got a server lying around
44:00
of your architecture, what you've designed here, what
44:02
processes would be running on the on
44:04
a server, and would running on a server
44:06
simplify this thing, or am I
44:08
just kind of moving complexity around?
44:10
I think you're moving complexity around.
44:12
I think you're moving complexity
44:14
around a little bit for a couple of
44:16
reasons. So first of all, it has to
44:18
be exposed to the internet somehow because the
44:20
slack servers are on the internet. So either
44:22
you need something like an in-grock that's doing
44:25
this sort of piping of public to private
44:27
to get to your server, or you need an
44:29
ALB. to receive the traffic, which is
44:31
going to cost you more than bedrock
44:33
is costing per month, even with no
44:35
use. I think it's $16 a month,
44:37
even if you have no services at
44:39
all. And you also need to
44:41
handle the authentication, because you're using
44:43
IM authentication. I was going to
44:46
say it expired after a few months,
44:48
but with an implicit IM role, I
44:50
think that would actually be solved. So
44:52
you would just have to handle ingress,
44:54
and it would work just fine. I think
44:56
that's all. Yeah. There's a main function in the code.
44:59
I think in the show notes we'll link you to
45:01
the get up if you want to check it
45:03
out. There's a lambda handler and there's a main
45:05
function handler and they're written in such a
45:07
way you can just run the code and
45:10
it'll detect your sort of context. And if
45:12
you're just running on your computer and you
45:14
have all the things installed, it'll work
45:16
fine. As opposed to the web hook
45:18
reaching out to that lambda URL and
45:20
firing up the it's just gonna the
45:22
web hook would instead hit my server
45:25
Instance and and it would run from
45:27
there were to just be sitting there
45:29
live waiting Yep, absolutely. So it's it's
45:31
an either or You've built something that's
45:34
standalone no infrastructure required don't have to
45:36
upgrade don't have to maintain and it's
45:38
pretty cheap I looked at it not being
45:40
someone who spends much time in cloud or
45:42
writing terra form or I've never written a
45:45
lambda function of my life going, ah, this
45:47
all seems a little intimidating, but like a
45:49
service running on a server, that I know,
45:51
that I'm really comfortable with. But as you
45:53
say, it is just moving things around. Now
45:55
I've got, now I've got a process living on
45:57
a server and now I've got to care careful.
46:00
and feed it. Yeah, absolutely. It becomes
46:02
a pet. This is the Lambda version is
46:04
a cattle. If it goes crazy and throws
46:06
an error, we kill it and we get
46:08
a new one and that's an unfortunate metaphor.
46:10
But if it's on a server, it
46:12
has to be your pet. You're monitoring
46:14
the CPU. You make sure the
46:16
disk doesn't fill up. Have you
46:18
patched it recently? You probably should.
46:20
Do we have any kind of
46:22
SRE infrastructure in place to monitor? It's
46:24
a CPU getting high and it's slowing down
46:27
because it's handling too much? Do we have
46:29
anavirus on it? You just sort of have
46:31
to handle all of that stuff in this
46:33
sort of pet world where you have your server
46:35
and you have to care and feed it. And
46:37
then the final follow-up question for that
46:39
you've talked about Engrock. And it sounds
46:42
like it's a gateway for to go
46:44
between a public network and a private
46:46
network, kind of. If we haven't heard
46:48
of Enrock before, what is this thing?
46:50
What does it do? Totally. I hadn't
46:53
before I built this project, but I
46:55
had this public web hook coming from
46:57
Slack because you're messaging in and that's
46:59
what happens when you tag your bot.
47:01
It generates a web hook and it
47:04
sends it somewhere. And I had to
47:06
get it to my computer, which is,
47:08
you know, inside my internal network. I
47:10
didn't want to give myself a public
47:12
IP or anything. And I just have
47:14
a silly router, not a Cisco ASA
47:17
or something to do like a static
47:19
net. shield from the internet. So I
47:21
needed to get that to my computer.
47:23
And this Engrock service, it's an
47:25
open source tool and platform that
47:27
lets you do one URL forwarding
47:30
concurrently to your private computer and
47:32
it sort of builds a tunnel
47:34
from the Engrock service to
47:36
your computer from public to
47:38
private. And it gives you some
47:40
inside into each HDTP connection of like
47:43
the code that you're receiving and returning.
47:45
So like a 200 is a happy
47:47
little HDTP packet. And you can see the
47:49
latency and how much traffic you're getting
47:51
and stuff like that. And it's
47:53
just this cool little open source
47:55
tool and platform that may
47:57
developing locally from like public.
48:00
generated web hook. Super easy.
48:02
I doubt it's as secure enough for
48:04
an enterprise implementation. I
48:06
wouldn't build your whole bot like that.
48:08
But you probably could. Maybe you
48:11
could run ingrock to get your
48:13
public access to your local development
48:15
environment. Try it out. the key
48:17
piece of it, it sounds like you said there's
48:20
an Engrock service. So there's some, some Engrock service
48:22
living out in the cloud that's going to be
48:24
basically a proxy. I'm going to send my web
48:26
hook, it's going to land on the Engrock service,
48:28
which is going to go, oh, I know where
48:30
this goes. It's on the other side of this,
48:32
Donald, to Kyler's, sitting inside of her enterprise, and
48:35
sends it, it's going to go, oh, I know
48:37
where this goes, it's, it's on, it's on, it's on,
48:39
it's on, it's, it's on, it's, it's, it's on, it's,
48:41
it's, it's on, it's, it's, it's, it's, it's, it's on,
48:43
it's, it's, it's, it's on, it's, it's, it's, it's on,
48:45
it's on, it's, it's, it's, it's, it's listening
48:47
on local hosts, port 3,000 or something
48:49
like that. And you just tell
48:51
the ingrock, like, accept traffic on
48:54
443, securely, and then tunnel it
48:56
securely to me, and drop it on
48:58
port 3,000 local hosts. And so your
49:00
Python script receives the traffic. It's wild,
49:02
because it's so complex. But I probably
49:05
spent 15 minutes googling it, and then
49:07
I turned it on, and it worked
49:09
right away, and I've never had an
49:11
issue with it. And I've never had
49:14
an issue with it. incredible for really
49:16
rapid development. So we talked earlier about,
49:18
you know, the model you chose and
49:20
it was from Antropic, a cloud version.
49:23
But you also, so that's, that cloud
49:25
version is trained on some essentially public
49:27
data set, but you wanted to augment
49:29
it with internal data, which you
49:31
said you're coming from confluence. Is
49:33
that what rag means? Retrival augmented
49:35
generation? Is that what this was
49:37
or something else? That's exactly what
49:39
it is. So for folks that,
49:42
you know, aren't AI engineers,
49:44
rag is retrieve and generate
49:46
or retrieval augmented generation, which
49:48
means using AI to construct
49:50
vectors. So you take this unstructured
49:52
data, which that's the the slur
49:55
that AI engineers use for stuff
49:57
that's written for humans, like you
49:59
have. a document, you have a
50:01
chart, you have a, you know, your
50:03
Excel spreadsheet that's written for you to
50:05
understand it. It's not written for an
50:08
AI model to understand it and sort
50:10
of convert that data to a format
50:12
that's understandable by a, you know, a
50:14
vector database by a model. But there
50:17
are models that are specifically an embedding
50:19
model. That's what it's called. They're specifically
50:21
built to take unstructured data and
50:24
store it in vector databases in
50:26
a format that's compatible with models. And
50:28
so it read all of our confluence.
50:31
It also supports S3 on the AWS
50:33
side and you can upload whatever. Cool
50:35
side benefit of that is when you
50:37
upload stuff, it triggers the knowledge base
50:39
to read it right away versus the
50:42
confluence side. You have to go click
50:44
the button that says read confluence again
50:46
today, which is, I hope they solve
50:48
scheduling in the future, but right now
50:51
you have to click a button to
50:53
say read it again. And there's
50:55
also others supported like their share
50:57
points. So I'm. just going to
50:59
keep scaling this. The way that we're
51:01
starting to frame this project internally is
51:03
all of this data is already accessible
51:06
to all of our users. You
51:08
can go to confluence yourself and
51:10
read the website or SharePoint or
51:12
Slack or our PDFs for our
51:14
customer service agents or our page
51:16
of duty resolutions for our SREs.
51:18
And all of those services individually
51:20
have AI models you can pay for. A lot
51:22
of companies have risen to this, but
51:24
they're all seated licenses and they're all
51:26
separate. And so if you're paying for
51:29
like $10 a month per user per
51:31
platform, that's like, I don't even know
51:33
if the math just goes crazy. So
51:35
even if we're paying $14,000 a year
51:37
to have this knowledge base exist, if
51:39
I can put data from all of
51:41
these disparate services in one place, then
51:44
this model can make some pretty
51:46
informed decisions if it can read
51:48
your pager duty and your share
51:50
point and your confluence and your
51:52
slack and maybe all your PDF
51:54
of like how to resolve stuff. So.
51:57
That's I think the pitch
51:59
is. What could an AI model do
52:01
that's very accessible for your users and
52:04
private and has read all of your
52:06
internal infrastructure documents? And maybe your configs
52:08
too. I don't know what we can
52:11
train it on, but we're going to
52:13
put a lot in there and see
52:15
what happens. And just a side note,
52:18
we've been talking about vector database. I
52:20
didn't know what that was. I had
52:22
to look it up before the show.
52:24
So my understanding is that the thing
52:27
that's cool about a vector database is
52:29
that it can sort of. If I
52:31
put in a query about smartphones, it
52:34
will return information that it found related
52:36
to also mobile devices and cell phones,
52:38
as opposed to just keying off the
52:41
specific word smartphone. That's the benefit of
52:43
a vector database. Is that correct? Yeah,
52:45
that's my understanding too. Keywords work the
52:47
best. It still shows pretty solid deference
52:50
for like an exact match of a
52:52
keyword. But yeah, it's finding related topics
52:54
and in much the same way your
52:57
brain would when we say phone, you
52:59
think like Android, iOS, blah, blah. Yeah,
53:01
long lost Blackberry. Yeah, so. So getting
53:04
your confluence database into the model that
53:06
you have to do anything special to
53:08
prepare the data or was just like,
53:11
here you go and you take care
53:13
of it? No, it worked really well.
53:15
This is a beta data source that
53:17
is supported by AWS Bedrock's team. So
53:20
you create a knowledge base and then
53:22
you add data sources to it. The
53:24
knowledge base basically is one-to-one with an
53:27
open search database that's just running in
53:29
the background charging you money. And the
53:31
data sources are these sort of scheduled
53:34
automated processes to go scrape something and
53:36
put the data, shove the data into
53:38
your open search. database into your knowledge
53:40
base. And so you can add lots
53:43
of data sources into one knowledge base.
53:45
So confluence is supported. It's still in
53:47
beta. It still doesn't read a lot
53:50
of your specially structured data. So bedrock
53:52
itself supports PDFs and documents and Excel
53:54
and etc. But this specific. ingestion mechanism
53:57
for confluence doesn't. So each time I
53:59
run it, it shows 80% failure. It's
54:01
like 200,000 failures of 250,000 documents, like
54:03
70 or 80%. And it still works.
54:06
It's ingested all the text, but it
54:08
doesn't ingest the binary files, the structured
54:10
data files. So hopefully that's coming. I'm
54:13
sure that'll be solved by them in
54:15
the future. But what we're gonna do
54:17
for other data, like... PDFs that we
54:20
upload to S3 or data we're scraping
54:22
from internal wiki's and putting in there
54:24
is this structured data file type where
54:27
you say the source is this URL
54:29
even if it's not accessible to the
54:31
model. You're just storing it in S3.
54:33
And so when your model references that
54:36
data. You say read this thing and
54:38
it says, oh, you know, it comes
54:40
from this this PDF. It doesn't say,
54:43
go read the PDF in the S3
54:45
human person. It says, go to this
54:47
URL that you have access to, this
54:50
internal wiki URL that isn't read, because
54:52
you can access it there as a
54:54
human, but our model is not able
54:56
to support that yet. So that's probably
54:59
how we're going to support a lot
55:01
of internal data that's. private. That means
55:03
duplicating data, which isn't great, but being
55:06
able to put it in this model
55:08
is pretty cool and powerful. So I
55:10
think we're going to explore that. With
55:13
that data ingestion that you were just
55:15
describing, rag and augmenting, the core model
55:17
with all of this specific data that's
55:19
specific to your organization. I'm a network
55:22
engineer and I want to be able
55:24
to ask the model about the state
55:26
of the network in real time. Can
55:29
you imagine a scenario where this, I
55:31
don't know, some kind of a telemetry
55:33
feed or something where the model can
55:36
kind of keep up with the state
55:38
of the network? And so then I
55:40
can ask questions and it'll tell me
55:43
what's going on in New York, you
55:45
know, these kind of things. Yeah, absolutely.
55:47
There's so much to this question. And
55:49
there's so much I don't know yet
55:52
that I want to test out. But
55:54
generally, these types of models are trained
55:56
on general data. They're not trained on
55:59
the state of the network. They're trained
56:01
on how does terraform work, what are
56:03
AWS service names, etc. And you can
56:06
pass it information in real time. Like,
56:08
hey, we got an alert. in our
56:10
slack that a VPN went down. Can
56:12
you tell us what to do? But
56:15
it's not reading your config, like it's
56:17
not S.S.H. into your firewall and looking
56:19
at state data. It's just reading the
56:22
log in slack and sort of giving
56:24
you basic information about it, which is
56:26
not great and not what we want.
56:29
So there is this concept. You'll see
56:31
it if you start reading about AI
56:33
stuff called agentec. A agent. And it's
56:36
a very fancy word that just means
56:38
you give the model the ability to
56:40
do stuff. And so I can absolutely
56:42
see a use case for saying like
56:45
asking the model, is the VPN to
56:47
Chicago up on this firewall? And giving
56:49
it the ability to in the background,
56:52
SSH to your device and read your
56:54
list of firewalls and look for one
56:56
of the tunnels that has a description
56:59
of Chicago and see its state. And
57:01
that will all take some sort of
57:03
custom building stuff. I haven't gotten to
57:05
building agentic stuff yet, but that's supported
57:08
by bedrock. That's supported by Azure AI.
57:10
I imagine supported by GCP. And that's
57:12
going to be the next generation of
57:15
AI stuff. It's still very, very new.
57:17
It will develop significantly in the next
57:19
year or two and hopefully have some
57:22
sort of pre-built puzzle pieces for us
57:24
to SAGE to something from an internal
57:26
impoint and read the data and sort
57:28
of... added as a conversation turn so
57:31
that AI can do it, can understand
57:33
it. But that's for sure coming. One
57:35
of the internal projects that I'm going
57:38
to be building in a hackday pretty
57:40
soon is to give an AI model
57:42
internal network architecture diagram. So like a
57:45
PDF that shows all the system names
57:47
and hosts subnets and stuff like that.
57:49
And then hopefully a user will be
57:52
able to talk to the bot and
57:54
say, my IP is this. I'm going
57:56
to this destination IP on this port
57:58
number. Is it accessible? And what I'm
58:01
hoping the AI will be able to
58:03
do is understand. where the user is
58:05
in the network diagram and where the
58:08
destination is, and then look at all
58:10
the interstitial nodes in the middle and
58:12
read their configuration. Because if we have
58:15
put the configuration into like an S3
58:17
bucket, this isn't real time, by the
58:19
way, this is like maybe having your
58:21
rancid open source config backup tool, dump
58:24
it into S3, and then read the
58:26
S3, is it currently permitted? Because how
58:28
much time as a network engineer do
58:31
you spend with people saying, you know,
58:33
I, my host can't get to the
58:35
thing? Should I be able to get
58:38
to the network that's doing this? So
58:40
if you can have a bot that
58:42
answers all those questions, imagine how much
58:44
time you can get back. So I
58:47
have no idea if that will work.
58:49
I hope that it will work. Ask
58:51
me in a couple weeks after we
58:54
have this hack day and we'll let
58:56
you know whether we've succeeded. I suspect
58:58
it's a challenging problem in that. We've
59:01
got, there's companies out there that have
59:03
products that do this that have taken
59:05
them years to develop so that it's
59:08
robust enough for an enterprise use case.
59:10
Forward Networks comes to mind, as folks
59:12
in this space doing this kind of
59:14
stuff. But I'm intrigued. I really want
59:17
to know where this goes. Another question
59:19
is related to cost. We were talking
59:21
about cost before and you said, hey,
59:24
the big cost comes in when you
59:26
deal with that knowledge base. Is that
59:28
what we were just talking about with
59:31
what you do with confluence or anytime
59:33
you're ingesting rag-style data? Is that where
59:35
that cost is going to come in?
59:37
Yeah. It's funny because even if you're
59:40
just reading like a PDF. It still
59:42
has to spin up the whole infrastructure
59:44
of a knowledge base that's actually an
59:47
open search database in the background. And
59:49
its minimum cost is that like around
59:51
$17,000 a year. And that's huge, even
59:54
if you're reading one PDF and training
59:56
it on. on one PDF. So that's
59:58
just not great. I'm hoping that as
1:00:00
these technologies mature, we'll get to points
1:00:03
where we can have it be much
1:00:05
cheaper and be charged based on like
1:00:07
the number of tokens ingested or something
1:00:10
that's more. corollary to the amount of
1:00:12
data. Because one PDF training shouldn't cost
1:00:14
$18 a year. Like that's just unreasonable.
1:00:17
And I think our AWS team is
1:00:19
understanding of that. We've been able to
1:00:21
make one change to bring the cost
1:00:24
down about 35% which is helpful, but
1:00:26
I wanted to be a couple hundred
1:00:28
dollars a year or something more correspondent
1:00:30
to the value that it's generating for
1:00:33
the business. And we're just not there
1:00:35
yet. We have to really commit as
1:00:37
an enterprise. And if you're like a
1:00:40
mom and pop shop, You can't spend
1:00:42
18 grand a year on this widget.
1:00:44
Like, it's just not reasonable. So hopefully
1:00:47
we'll get there soon. Well, Carla Middleton,
1:00:49
thank you for sharing all of your
1:00:51
experience and knowledge of those. This was
1:00:53
absolutely fantastic. And if you're listening and
1:00:56
you want to get into the details,
1:00:58
you want to see everything, all the
1:01:00
code, all the terra form, etc, that
1:01:03
Carla's been working with. That's all at.
1:01:05
Let's do Devops.com, which is which is
1:01:07
her sub stack. I'm very active on
1:01:10
LinkedIn. I host that day two Devops
1:01:12
with Ned Bellavans on the same pack
1:01:14
of pushers network that you're on now,
1:01:16
so please come check us out. And
1:01:19
I just get around to as many
1:01:21
conferences as will have me. I'm hopefully
1:01:23
going to be in Philadelphia later this
1:01:26
year at Reinforce. So look for me
1:01:28
there. If you do want to read
1:01:30
more about this AI stuff on let's
1:01:33
do Devops.com, I have a coupon code
1:01:35
to read the stuff that's still behind
1:01:37
the payroll. It's all becoming free, but
1:01:40
let's do Devops.com/heavy networking with no space
1:01:42
or dash or anything. We'll get you
1:01:44
a free month trial to go read
1:01:46
it all, copy it to your desktop,
1:01:49
get that stuff down, and all the
1:01:51
code is free on get up. It's
1:01:53
all linked from the sub stack. So
1:01:56
you can do this in your own
1:01:58
enterprise. to devops.com/heavy networking. I didn't know
1:02:00
you were going to do that until
1:02:03
just this second. That's awesome. Kylie, thank
1:02:05
you for that. Seriously. Again, Kylie gave
1:02:07
me a couple of freebies so I
1:02:09
could read and research for this podcast
1:02:12
without having to sub to the, or
1:02:14
sub stack, but I will tell you.
1:02:16
Subscribe to the substack, it's that good.
1:02:19
It's really, really valuable information if you're
1:02:21
at all interested in this stuff. Anyway,
1:02:23
thank you for listening to Heavy Networking
1:02:26
Today from the Packet Pushes podcast network.
1:02:28
It's all content for your professional career
1:02:30
development and just some quick housekeeping items
1:02:33
as we close, merch, go to store.
1:02:35
Packet Pushes.net, and don't overlook the collections
1:02:37
link in the header of store. Packet
1:02:39
Pushes. We got stuff for every show
1:02:42
that's family. podcast palette. We have a
1:02:44
newsletter. We have multiple newsletters, but I'm
1:02:46
going to focus on the human infrastructure
1:02:49
newsletter today. Drew and I publish that
1:02:51
every week. We share the best blogs,
1:02:53
news, vendor announcements, resource, and of course
1:02:56
memes that we have found. Everything you
1:02:58
need to know from the world of
1:03:00
networking and tech sent to your inbox
1:03:02
with love. AutoCon 3 is our last
1:03:05
housekeeping note today. AutoCon is the industry's
1:03:07
only conference devoted to network automation and
1:03:09
it is coming to Prague in late
1:03:12
May 2025. The Packard Pushes team is
1:03:14
going to be there and we would
1:03:16
love to see you. Visit network automation
1:03:19
dot forum and get a ticket for
1:03:21
AutoCon 3 while you still can. This
1:03:23
event does have an attendance cap and
1:03:25
it will sell out. It's gonna. It's
1:03:28
just a matter of time before it's
1:03:30
settled out. So if you're interested in
1:03:32
going abroad for AutoCon 3. Go buy
1:03:35
your ticket, network automation.forum. And if you
1:03:37
enjoyed our conversation with Kyler, you again
1:03:39
should subscribe to her podcast, a two
1:03:42
Devops, along with Ned Belavance, and her
1:03:44
let's do Devops sub stack that would
1:03:46
be pretty swell of you. If you
1:03:49
have comments or questions about this show,
1:03:51
send them to us via packet pushers.net/follow-up.
1:03:53
And until next week, just remember, too
1:03:55
much networking would never be enough.
Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More