Episode Transcript
Transcripts are displayed as originally observed. Some content, including advertisements may have changed.
Use Ctrl + F to search
0:00
Today on the AI Daily Brief, a
0:02
case study in building voice agents. The
0:04
AI Daily Brief is a daily podcast and video
0:06
about the most important news and discussions in AI. To
0:09
join the conversation, follow the Discord link in our
0:11
show notes. Today
0:18
we're doing something a little bit different and
0:20
that I'm very excited for. As you guys
0:23
might have heard, over the last six months,
0:25
Our team at Super Intelligent has been working
0:27
on a voice agent that is effectively the
0:29
core of a new type of automated consultant
0:31
that we deploy as part of our agent
0:33
readiness audits. Agent readiness audits
0:35
are a process whereby we go in
0:37
and interview people inside companies about A,
0:40
all of the AI activities and agent
0:42
activities they're currently engaged in, as well
0:44
as B, just their work more broadly. The
0:47
goal is to benchmark their AI and agent
0:49
usage relative to their peers and competitors. as
0:51
well as to map the opportunities they have
0:53
to actually deploy agents to get value. A
0:56
core part of how we do this
0:58
is a voice agent that we've developed
1:00
that can interview dozens, hundreds, or thousands
1:02
of people at the same time, on
1:04
their time, 24 -7, totally unlocking
1:06
a differentiated ability to capture information
1:08
than anything that consultants have previously
1:10
had. Today, we're talking with our
1:12
partners at Fractional who have been helping us build
1:15
this technology to do a bit of a case study
1:17
in what it looks like to actually build a
1:19
voice agent. It's been a really fascinating process and we're
1:21
excited to share a bit of the learning, especially
1:23
because we think that this is a technology that many
1:25
of you are probably going to deploy for your
1:27
own purposes in the months or years to come. All
1:30
right, Eddie, Chris, welcome to the AI
1:32
Daily Brief. How you doing? Doing
1:35
great. Awesome. Thanks for having
1:37
us. Yeah, this is going to be a fun one.
1:39
I mean, so this is something where we're talking about
1:41
something that you guys have built, you know, lots of
1:43
versions of we have built together. And I think that,
1:45
you know, this is a little bit different than our
1:47
normal content, because as opposed to just talking about, you
1:49
know, what's going on in markets theoretically or what people
1:51
are building theoretically, we're actually talking about something that we've
1:53
got live that we've done that we've done some reps
1:56
on. Let's put it that way. So I think just
1:58
to kick it off, maybe if you guys could give a
2:00
little bit of background on on fractional and
2:02
yourself, just so people have that context before
2:04
we dive in. Yeah, so
2:06
I'm Chris CEO co -founder here
2:08
at fractional the the basic
2:10
thesis behind the business is that
2:12
one of the biggest winners of this
2:14
whole AI Moment is going to be
2:17
non AI businesses your everyday company that
2:19
can use gen AI to improve its
2:21
operations Improve its its products and services
2:23
and that those companies need help They
2:25
especially need help from top caliber
2:27
engineers who can wrangle this magic
2:29
hallucinating ingredient into production grade systems
2:31
And so the purpose behind fractional
2:33
is to bring those engineers together
2:35
in one room, have them all
2:37
work on Jenny I projects and
2:39
learn best practices from each other and build out
2:41
the best of body engineering team in the world.
2:43
And so that's been very much the division from
2:45
day one. And it's it's going going exactly according
2:48
to plan, which is always always fun with a
2:50
startup. And I think the first time in our
2:52
entire careers where that's the case. So it's been
2:54
great. And working with you and your team on
2:56
the voice agent has been been really fun. Awesome.
2:59
And Eddie, maybe maybe we can actually injuries
3:01
you a little bit with my first question just
3:03
to set up. So I think that the
3:05
main thing we want to do today is actually
3:07
talk about what it looks like to, you
3:09
know, put, put a voice agent into production. You
3:11
know, I think we learned a, we have
3:13
learned a bunch of things. We continue to learn
3:15
things in practice, but maybe to kick off,
3:17
I think just zooming out, one of the big
3:19
questions that we always deal with when it
3:21
comes to enterprise customers, enterprises that are thinking about
3:23
AI transformation is this buy build question. Right.
3:25
And I wonder, you know, you guys are, are
3:27
front lines dealing with this. Is this even
3:30
the right way to think about things at
3:32
this point? You know, especially when it
3:34
comes to agents, is there actually like a
3:36
strict buy build hierarchy? Is everything just.
3:38
some spectrum of build. What do you think
3:40
the current state of buying versus building
3:42
is with agents, especially as companies are thinking
3:44
about what it means to even enter
3:46
the agent space? Yeah, I think
3:48
it's right that everything exists somewhere on the spectrum.
3:50
I think it's pretty rare that you have
3:52
a workflow that's a good fit or a product
3:54
feature that's a good fit for an
3:56
agentic solution where you can just go buy something off
3:58
the shelf that just works. The off the shelf stuff
4:00
is great for really general purpose productivity tools
4:02
and like, you know, things like deep research
4:04
that are sort generalized tools are
4:07
like awesome. But when it comes
4:09
to, you know, specific bespoke
4:11
workflows in your business, I
4:13
think there's a spectrum of are we building
4:15
all the way from scratch? Are we building
4:17
on top of good, powerful new primitives that
4:19
are coming into the market? Are we doing
4:21
some building work that requires just sort of
4:23
integration of off -the -shelf tools, but I think
4:25
it's rare that we see great fits of
4:27
sort of off -the -shelf tools that really replace
4:29
an existing manual workflow. Yeah,
4:32
and this has sort of been our experience as
4:34
well. Everything is to some
4:36
extent billed, even if it's only customized.
4:38
And so with that as background, you
4:40
know, you guys have now had a chance to
4:43
spend a bunch of time, you know, thinking about voice
4:45
agents, digging into voice agents. There
4:47
clearly seems to be resonance with voice agents
4:49
in the market. A lot of people
4:51
are finding a lot of different use cases.
4:54
Do you have a thesis for why that is
4:56
or what you attribute that to? I
4:58
think the technology has just gotten a lot
5:00
better and I think the applications are
5:02
obvious. Any business that has some kind of
5:04
call center or has some kind of
5:06
bottleneck in their business that is voice related
5:08
is looking in the direction of this
5:11
technology because I think the applications are broad
5:13
and obvious. And the
5:15
technology is finally there. If you have an experience
5:17
of talking to one of these things in the
5:19
wild, I've only had a few
5:21
thus far, but they're starting to become more
5:23
frequent. And every time I'm always impressed by
5:25
what a pleasant experience it is as a
5:27
consumer. And so I think we're just
5:29
going to start seeing these things pop up everywhere. Also,
5:32
Voice is just a great fit
5:34
for certain kinds of data
5:36
collection, basically. You know, I
5:38
think you'll see it in the in the
5:40
use case. We're to dive into a minute with
5:42
Super's use case. You know, there's a reason
5:44
why when you go to do research about what's going
5:46
on inside of a big company, one of the
5:48
things you do is you go in and you interview
5:50
people and you ask them questions instead of just
5:52
like sending them a survey, you know, that sort of
5:54
fixed data entry kind of task is not a great
5:57
fit for a lot of kinds of situations where
5:59
you want big open -ended responses and you want
6:01
people to serve ramble and and, you
6:03
know, realize thinking on the fly, things
6:05
like that happen really naturally over voice.
6:07
And to Chris's point, finally, the technologies
6:09
at a place where we can start to chip
6:11
away at the kind of stuff that only a human
6:13
interviewer could have done before. Yeah. I
6:15
mean, I think it's interesting. So
6:17
for, for backgrounds, we're going to talk
6:19
about, you know, the voice agent
6:21
that we've been collaborating on is this
6:23
sort of data collection experience, right?
6:26
It is meant to capture information around
6:28
people's current workflows, their current AI,
6:30
you know, adoption techniques in order to
6:32
help us give them recommendations around
6:34
what agent opportunities they have. That's the
6:36
core idea. And the starting point,
6:38
the central sort of genesis of this
6:40
was that. A, to your point,
6:42
Chris, the technology was such
6:44
that it actually just is good enough to
6:46
do this, right? You can actually have an agent
6:48
interview people and it does a pretty good
6:50
job. You know, not off the shelf, as we'll
6:52
see. You know, we had to do a
6:54
lot of kind of development to make it work.
6:56
But still, the capabilities are there. The second
6:59
piece, and I think this is the piece that
7:01
you were speaking to, is it is actually
7:03
not just as good an experience as the human
7:05
equivalent. There is a lot to
7:07
recommend this as a better, an actual, just
7:09
factual, better experience. First, the fact that
7:11
you can collect information with voice and having
7:13
people talk instead of people type, just
7:15
instantly, it's so much easier for many, many
7:17
people, if not most people to ramble
7:19
about something and just speak at it, then
7:21
to sit down, try to collect their
7:23
thoughts, try to structure it and type it.
7:25
And it's faster, no matter what, right?
7:27
You can get just the amount of information
7:29
per unit of time. And it's going
7:31
to be way, way higher if you're, if
7:33
you're having people talk. So that's one.
7:35
Second, the ability to do that on demand,
7:37
on your own schedule, whenever you are, maybe
7:40
if you're walking to work, whatever, like
7:42
four AM at night, when you can't wake
7:44
up as opposed to having to schedule
7:46
a human interview is again, just a, that's
7:48
not a one X improvement. That's a.
7:50
10x improvement and convenience of something. And so
7:52
I think those two things combined, both
7:54
the fact that the technology is there and
7:57
it's actually just a better potential experience
7:59
makes a huge difference. You know, certainly that's sort
8:01
of like what the insight was that when we had going
8:03
into it. Yeah. In addition to
8:05
that, you don't have to hire out a team
8:07
of thousands of consultants in order to conduct the
8:09
kind of interviews that you guys want. Yep.
8:12
In fact, it's interesting to, uh, you know, maybe
8:14
to come back, come back to this, but
8:16
you know, I've had a lot of conversations with
8:18
consultants after having, having built this. And on
8:20
the one hand is fairly disruptive to at least
8:22
a piece of what they're trying to do,
8:24
right? This is something that consultants bill lots and
8:26
lots of money for to do this data
8:28
collection. Interestingly, what I
8:30
keep coming across is
8:32
consultants don't see their
8:34
value, their primary value
8:36
as collecting information. It's
8:39
like the proprietary knowledge and experience they have, the
8:41
way that they analyze it. So they're actually extraordinarily
8:43
bullish. Like they don't want to have
8:45
to force their customers to use a huge
8:47
portion of their budget. Just in the
8:49
data collection, they'd much rather have that be
8:51
able to go to the actual processing,
8:53
the analysis, what they do next with it.
8:55
Right. So even though this sort of
8:57
piece is actually theoretically disrupted by, I think.
8:59
think it's likely to shape how we
9:01
see that industry evolve as well. I
9:03
think there's also just a whole breadth of
9:05
insights that are probably not being captured in a
9:07
lot of those consulting scenarios just because you're
9:10
limited by only being able to do whatever, 10
9:12
interviews or something like that. Whereas what could
9:14
you learn if you could actually do 1 ,000
9:16
custom interviews in parallel and be able
9:18
to actually process the data coming
9:20
back from that? Yeah, the
9:22
point about this is not what the consultants
9:24
want to be doing too. It's like that
9:26
that is something we see broadly across basically
9:29
every project that we do. It's the things
9:31
that it's the repetitive work that takes away
9:33
from the higher order tasks that you want
9:35
to get to on your to -do list and
9:37
don't have time to get to that AI
9:39
is so well suited for and very often
9:41
we find that exact kind of dynamic. we're
9:43
automating away the things that people People just
9:46
the banger banger head against the wall do
9:48
this a bunch of times and it's not
9:50
super intellectually stimulating that kind of stuff. We
9:52
can delegate whether that's voice or or
9:54
text and free up people to do
9:56
higher order tasks. Awesome. Well,
9:58
let's let's dive in and talk about what it what it looks
10:00
like to actually build a voice agent in practice and
10:03
what we've learned. So Eddie, you know, I'm not sure
10:05
exactly what the right place to start is, but I'll
10:07
let you take it away from from here and and
10:09
dig into it. Yeah, absolutely. So,
10:11
you know, I think you
10:13
sort of called out correctly earlier that like
10:15
the technology is there But that
10:17
doesn't mean it just works off the shelf or that you
10:19
don't need to do a bunch of custom work here. And
10:22
so the technology in this use case that
10:24
we really leaned on to build this interview
10:26
agent. And by the way, the way this
10:28
agent actually works in practice is we configure
10:30
it with sets of interview questions and goals.
10:32
So here are the things we want the
10:34
person to be asked. Here are the reasons
10:36
why we're asking them. We prioritize those goals.
10:38
And that's kind of the input to this.
10:41
very agentic system that is then in
10:43
charge of deciding how exactly do I phrase
10:45
these questions? When do I follow up?
10:47
What do I ask next? When have I
10:49
met my goals? And so
10:51
it's got a lot of agency.
10:53
It's highly sort of undirected. And
10:55
the kind of out -of -the -box technology
10:58
that we have access to right now,
11:00
and there's a few different alternatives here,
11:02
but the one we chose for this
11:04
project was the OpenAI real -time API, which
11:06
has great real -time voice capabilities. It's
11:08
got nice realistic voices that sound
11:10
pretty human, and it's pretty smart in
11:12
its ability to make decisions on the fly.
11:15
If you just give a monolithic prompt to
11:17
that model that tells it about the
11:19
interview and the questions it might want
11:21
to ask, you get a pretty cool
11:23
result, but it goes off the rails
11:25
all the time. It asks weird questions.
11:27
It's hard to tune when it follows
11:29
up. If your only mechanism for control
11:32
here is a giant monolithic prompt, your
11:34
hands are really tied. And so
11:36
we quickly found that while it ran some
11:38
interviews well, it ran some interviews really poorly, and
11:41
our control over what happened next was
11:43
pretty limited. And so one
11:45
of the areas where it fell down
11:47
was... It didn't always make smart choices about
11:49
what question to ask when. We would tell
11:51
it all the questions up front. It would
11:53
be up to it to decide which one
11:55
is next. And so we ended up doing
11:58
is abstracting out an entirely out of band
12:00
sub agent that's running in
12:02
parallel in the background, assessing the
12:04
conversation. And its whole task is like,
12:06
if we were to move on to another question right
12:08
now, which one should we move on to? And
12:10
then the core agent is just told, here's
12:12
the one question we're working on now in the goals. So
12:15
it's like one example of how we had to
12:17
take this thing you know, from going off the rails
12:19
and getting it back on. Another thing we added was this
12:21
sort of, we were calling it the drift detector sub agent. I
12:23
think for a while we were calling it the rabbit hole
12:25
detector. Like these LLMs are
12:27
just so, you know, eager to please. They're
12:29
really like, they have, anyone who's interacted
12:31
with LLMs a lot like knows the personality
12:33
of one, right? And
12:35
so we kind of were like stuck
12:37
where We want it to ask follow -up
12:39
questions. We don't want to constrain
12:41
it to never ask follow -up questions. But if
12:44
you give it a little bit of rope, what
12:46
it ends up happening is, no matter what
12:48
you say, it's like, wow, your job is
12:50
so interesting. That's crazy. Tell me more about that.
12:52
Just sort of dig and dig and dig. And
12:55
so what we end up doing was
12:57
adding this whole side flow that's watching
12:59
the conversation and just sort of assessing,
13:02
all right, has this thing gone off the rails? Are
13:04
we going down the right path? Should we force?
13:06
under the hood, a tool call to force like more
13:08
moving on to the next question. So there's
13:10
a bunch of these sort of like subcomponents that
13:12
go into what feels like an overall large, agentic experience,
13:14
actually a bunch of sort of subcomponents. They're like
13:16
one of the more surprising ones, maybe anyone that's worked,
13:18
worked deep in the weeds on voice has seen
13:20
this before, but I think this is surprising to a
13:22
lot of people. The one
13:24
of the things we wanted to do here was
13:26
show a pleasant UI. And so that, that actually
13:29
added a bunch of constraints. One constraint was You
13:31
need to actually know what question is being asked
13:33
so you can show a little check mark on
13:35
the screen. You need to know what you're
13:37
planning on moving on to next. So this actually adds
13:39
quite a bit of complex standard of the hood. One
13:41
of the areas where this impact
13:43
of things was showing transcripts.
13:45
So we want to show a
13:47
written transcript of what's happened so far. In fact, we even want
13:50
to enable the user to interact over text if they want
13:52
to. The OpenAI models actually make
13:54
this really nice. They return with
13:56
a JPI response, both the audio
13:58
follow -up and the... what's happened
14:00
so far. The problem is that
14:02
transcript is like produced by a separate
14:04
model that's whisper running on the side, just
14:06
doing basic sort of speech to text. And
14:09
the core model and the transcript model
14:11
can disagree with each other. I
14:13
think you actually might have had the experience where you were
14:15
like on one of these interviews and there was like
14:17
a sneeze or a cough or something. And I think the
14:19
core model did the right thing. It was like, bless
14:21
you. But the output Of the
14:23
transcription was just like something that represented the underlying training
14:25
data randomly like it would it said like don't
14:27
forget to like and subscribe or like it would come
14:30
out in Korean or something like that Yeah,
14:32
we had a lot of like random background
14:34
noise turns into foreign language switches Yeah, yeah,
14:36
totally. So there's a lot
14:38
that went into into kind
14:40
of keeping this thing on the rails
14:42
One of that outcomes of this is that
14:44
you now have like a lot of different
14:47
knobs and levers You can adjust the core
14:49
prompt. You can adjust what model you're using.
14:51
You can adjust the questions you're asking. You
14:53
can change the wording of the goals and
14:55
the large number of degrees of freedom. I
14:57
mean, it's nice because you now have good
14:59
primitives to control your interviews, but it's scary
15:01
because, you know, kind of anything can happen
15:03
and you don't want to test that in
15:05
front of users. For all of these, these
15:07
are AI projects generally, like
15:10
it's absolutely critical early in your
15:12
development process to build strong
15:14
evals, you know, some automated way.
15:17
of producing metrics to tell you how well you're
15:19
performing and all the sort of key things you want
15:21
to know about your problem. This one
15:23
is just so hard. Like it's voice, it's
15:26
open -ended. There's no
15:28
really like great source of ground truth.
15:31
Like I don't even know, did you think at
15:33
all early in the project what ground truth would
15:35
look like? I mean, to me, I'm like, could
15:37
we collect a set of recordings of human interviews?
15:39
And even if we did, I don't even know
15:41
what we would do with that. Yeah. I mean,
15:43
so to maybe reframe the question and just sort
15:45
of super simple language, what does a good interview
15:47
sound like look like feel like it's inherently it
15:49
turns out once you dig in it's like wow
15:51
that's really subjective because it's like is it a
15:54
good interview because it got good information is it
15:56
a good interview because it was prompted it didn't
15:58
drag you too long is it a good interview
16:00
because you know people didn't have to repeat
16:02
themselves as you know it's all of
16:04
these things that it could be and
16:06
you add on top of that the
16:08
sort of layer of just human variability
16:10
like we're you know we are live
16:12
right now for example with a major
16:14
pharmaceutical company with every single person in
16:17
a department 250 different person doing the
16:19
same interview, what's good to them is
16:21
highly variable already before you get into
16:23
just on a human preference standpoint. So
16:25
yeah, I think this is actually an
16:27
enormously challenging thing. I think one of
16:29
the things that we sort of, one
16:31
of the places that we went,
16:33
I know you're going to take it
16:35
in a different direction with evaluation,
16:37
but even going back to the sort
16:39
of the way that the experience
16:41
developed over time. is we added more
16:43
knobs basically made the experience more
16:45
controllable basically that's sort of a shortcut
16:47
to making the user experience better
16:49
is giving the user more ability to
16:51
modify the experience right so you
16:54
know at your point at the beginning.
16:56
Like, if you're very open -ended, in fact, a great
16:58
use case that I would encourage people to play around
17:00
with voice agents for, the more that
17:02
you're down to kind of just let the
17:04
AI wander, you can get some really
17:06
interesting stuff, right? For us, we're pretty constrained.
17:09
We really needed a set of questions
17:11
to get answered. And, you know,
17:13
there was some amount of sequencing
17:15
that was important. And so we ended
17:17
up, one of the big sort
17:19
of moments for us, I think, with
17:21
this particular project was creating an
17:23
interface experience where people could jump
17:26
from different questions to questions. So, you know, we
17:28
had already added a skip or a, you know,
17:30
stop kind of button, but we wanted to go
17:32
even farther. We felt like we had to go
17:34
even farther, which was just like, I want to
17:36
look at all the questions, say, I
17:38
don't care about all these, but I do want to
17:40
answer that one. And so, you know, there's a bunch
17:42
of different ways to answer it, but it, you know,
17:45
it becomes a product design process very, very quickly. It
17:47
turns out. Yeah. And like,
17:49
you want to know, like to
17:52
your point about. what even makes
17:54
a good interview. Like
17:56
you want to know in a lab setting that you're
17:58
going to have good interviews. Like I think your question
18:00
earlier about when do you build, when do you
18:02
buy? Like actually voice agents are an area where
18:04
there's tons of great tooling coming out that like
18:06
this is company Bland AI that jumps to mind
18:08
that they like make a great product for designing
18:10
voice agents. Like they make it really easy to
18:12
put a voice agent on the phone to design
18:14
conversational flows, etc. But I think
18:16
it's that what we see in terms of adoption
18:19
is the adoption is happening in places
18:21
where people are kind of willing to learn
18:23
on the fly from real user conversations when
18:25
it went off the rails. And
18:27
the sort of tooling out there for making
18:29
sure in a lab setting that you're confident
18:31
that when I go send this into a
18:33
Fortune 500 company to do interviews, I'm not
18:35
going to do anything stupid. And
18:37
just getting that confidence is really, really
18:39
hard. What we ended
18:41
up doing on this one was we
18:43
built this whole separate system for creating
18:46
synthetic conversations where we collect all
18:48
these sort of written personas of the
18:50
types of real people we think
18:52
we would interview. This is a person
18:54
in marketing and here are the tools they use, here the
18:56
people they interact with, all sorts
18:58
of things like that. We write out
19:01
this persona and then we have a
19:03
separate LLM play the role of fake
19:05
customer. We conduct these interviews in the
19:07
text domain where over text, our agent
19:09
is interviewing this fake user and then
19:11
we're measuring a bunch of stuff about
19:13
the conversation afterward. you had asked
19:15
earlier what makes a great conversation. We spent a
19:17
lot of time on this one trying to define
19:19
that. And we ended up
19:21
with all of these metrics we produced. And
19:24
they're all imperfect. With all these eval
19:26
sorts of questions, you have to find the
19:28
80 -20 on, I don't want to spend
19:30
all of my time developing some perfect
19:32
lab metric for what makes perfect conversation. Because
19:34
there's so much stuff you won't know
19:36
until you go into the wild. I
19:39
think we had this experience where someone just started talking
19:41
to it in German in the middle of the conversation.
19:44
Luckily it just worked, but we wouldn't have guessed that one
19:46
in a lab. Yeah, you know,
19:48
and like adding complexity to this, just to
19:50
the extent that, you know, I think
19:52
my sense is that we've learned a lot
19:54
of things, we've solved a lot of
19:56
problems, but then there's new problems that come
19:58
up. One that I think is a
20:01
continued challenge with the evaluations are we have
20:03
this great, you know, a great suite
20:05
tool for testing for kind of like seeing
20:07
how different personas might interact. But
20:09
the AI still defaults to assuming
20:11
that all those personas will in
20:13
good faith engage for the time
20:15
it takes to finish the interview.
20:17
Whereas like within the first three
20:20
interviews that we tested, a
20:22
CEO started swearing at the thing like
20:24
halfway through, you know, question four and dropped
20:26
out. By the way, he ended up
20:28
coming back and it was a very useful
20:30
interview. And so was all worked out
20:32
fine. But like the AI was not the
20:34
synthetic testers did not think to storm
20:37
out of the room. as part of their,
20:39
as part of their tests based on
20:41
their personality. Yeah. I don't know if,
20:43
if you've ever done this, but sometimes I just
20:45
have fun going into chat, GPT and trying to, trying
20:47
to get the last word and it never happens.
20:49
Right. You say, okay, bye. And it's like, all right,
20:51
see you. Uh, everything's fine. They don't give up. I
20:54
do think though, like the, the
20:56
tuning of the underlying, like normally you
20:58
use these evals just to build
21:00
the software. It's like you're writing a
21:03
custom workflow. where you know
21:05
reasonably well what good looks like. And
21:07
then the question is, is our system
21:09
good? Here, you're also
21:11
designing an interview while you design the
21:13
system that can support interviews. And
21:15
the number of degrees of freedom is
21:17
super, super high. I think that's
21:19
common across anything voice and anything that
21:21
is conversational. The developers
21:23
working on chat, GPT,
21:26
have their work cut out for them to
21:28
figure out, are we having good conversations?
21:30
Do we mess up? Those are like really
21:33
fuzzy things to measure. Yeah,
21:35
you know, and I think too, one of the
21:37
one of the experiences learnings for me is
21:39
which is helpful, especially because our use cases literally
21:41
helping people figure out where to, you know,
21:43
deploy agents or which which agent use cases to
21:45
think about. We really are, you
21:47
know, there's all all sorts of different
21:49
definitions of what exactly an agent means. But
21:51
I tend to come back to the
21:53
very, very kind of clear and simple way
21:55
that I think enterprises think about it,
21:58
which is AI is stuff that I use
22:00
to make. my work better agents are
22:02
stuff that you know things that do the
22:04
work for me and that is very
22:06
crisp and clean in the context of this
22:08
voice agent where we are handing over
22:10
a customer. to it to ask a bunch
22:12
of questions with information that we need
22:14
to get with no ability to intervene if
22:16
it goes off the rails or doesn't
22:18
do a good job or you know like
22:21
we're just it's a small thing it's
22:23
you know it's not all that risky but
22:25
ultimately we're letting the agent do the
22:27
interview and it really is a clearly different
22:29
thing than you know us using chat
22:31
gbt to help prep for an interview or
22:33
something like that and it turns out
22:35
and eddie i think this is sort of
22:37
part of your point literally as soon
22:39
as you are allowing a thing to go
22:41
do the thing, the degrees of freedom
22:44
just become so much more immense than the
22:46
normal software experience. And even in a
22:48
relatively constrained environment, like there's 20 questions that
22:50
we really need you to answer. Yeah,
22:52
I think a question on like everybody's
22:54
mind right now is like. What is an
22:57
agent like everybody's got this separate definition
22:59
a separate way of framing the problem and
23:01
and it's just like a hot topic
23:03
in conversation right now I think we both
23:05
agree that this this one is a
23:07
highly agentic kind of example in a fairly
23:09
obvious way I think we tend to
23:12
think of like agency as being this sort
23:14
of spectrum like there are less agentic
23:16
things that are more agentic things and like
23:19
there are a few sort of sub attributes that
23:21
lead to something feeling more agentic. And like,
23:23
you know, one sort of element here is how
23:25
open -ended is the task? Like here it's completely
23:27
open -ended, right? Like you're given an interview, but
23:29
you're, you can really vary what you're doing. Another
23:32
is like how complex is it? You know,
23:34
we have some open -ended tasks, but it's
23:36
like the task is spam detection. It's like
23:38
the eventual result is like, you know, is
23:40
this spam or is this not? This one
23:42
is super open -ended. You have very broad goals
23:44
you're defining. And then the last
23:47
one is sort of like, I think what you
23:49
were sort of talking about a second ago, which is
23:51
who's taking the action at the end of all
23:53
of this? You know, is there some system that's behind
23:55
the scenes, eventually making a recommendation to a person? In
23:58
this case, no, right? Like there's nobody sitting there
24:00
watching the interview that the person doesn't even get
24:02
involved until you're reviewing the results of the interview
24:04
and trying to synthesize it. Even then, I think
24:06
like that's in the to do list to start
24:08
to tackle next, right? We're going to keep moving
24:10
through that and see how many places we can
24:12
apply agents in this process. So
24:14
as we kind of zoom out. having
24:17
gone through this experience, and obviously you're
24:19
bringing to bear tons and tons of different
24:21
projects at the same time, what
24:23
does this make you think around? Are
24:25
there other use cases that you're excited
24:27
about for voice agents, where you think that
24:29
companies should be really thinking about these
24:31
things? And maybe that's either specific use cases
24:33
or just types of problems or types
24:36
of opportunities that you think they're particularly well
24:38
suited for. Yeah, I think
24:40
inbound phone calls, and especially
24:42
within that spectrum, generally what you're
24:44
looking for is What's the
24:46
50 % of call volume that
24:48
is for very simple tasks? And
24:51
start with that with the ability to escalate
24:53
for the more complex things. So
24:55
that's one bucket. Another bucket
24:57
is outbound B2B calls. So
24:59
things like calling insurance companies to get, you
25:01
know, to gather information. That's
25:03
another big bucket. In general,
25:05
one of their best practices with this is, you
25:08
know, you always want the person who's talking
25:10
to the agent to know they're talking to an
25:12
AI agent and not to pretend that it's
25:14
a human. I think people are
25:16
very forgiving with being on the phone with AI
25:18
agents and they tend to be very positive
25:20
experiences, but I can imagine the hiding it
25:22
from a person would be a very bad, open
25:24
yourself up to a very bad experience. If
25:26
I just think back to my last week, what
25:28
I've seen in voice agents, they're all over
25:31
the place, and they're all super interesting in their
25:33
own way. We see folks
25:35
in health care that are currently doing
25:37
a bunch of... It's very similar to
25:39
your use case. It's someone conducting interviews
25:41
today. It's someone interviewing a bunch of
25:43
physicians to do market research. I
25:45
think it's open -ended, whether the right
25:47
answer there is such a regulated place
25:49
to allow a voice agent to do
25:51
that, or if the voice agent's riding
25:54
shotgun and providing suggestions. But in
25:56
either case, seems like it can help
25:58
there. We've seen folks in the rail industry,
26:00
you know, going on trains doing safety
26:02
sort of inspections where, where like, they're trying
26:04
of trying to take notes on an
26:06
app today. and it's like super awkward. They're
26:08
like on a train interviewing a conductor,
26:10
talking out loud to them, but also trying
26:12
to take notes. And it's just a
26:14
bad UX. And so the agents sort of
26:16
guiding that is potentially a better experience. A
26:18
technician who's on site and needs to
26:20
refer to an instruction manual for this
26:22
big complicated piece of machinery. And instead
26:25
of trying to flip through the manual,
26:27
they could maybe interact via voice. Awesome.
26:29
Yeah. I I mean, certainly I think
26:31
our experience has been immensely positive. Like I
26:33
said at the beginning, this is not
26:35
a one or two X improvement over the
26:38
alternative. It is a massive, you
26:40
know, it's, it's you can't even even really calculate
26:42
it. Like it is, it was not possible
26:44
before to interview every single person in a company
26:46
about what they do and try to map
26:48
agent opportunities. It is now possible. Theoretically, if they
26:50
all did it at the exact same time,
26:52
it could all happen, you know, in a half
26:54
an hour. So, you know, we're super excited.
26:56
We love working with you guys on this. You
26:58
know, we're excited that more and more companies
27:00
are interacting with it, giving us more context to
27:03
learn from. Really appreciate the time today as
27:05
well to share it and excited to bring you
27:07
guys back as we continue to build this
27:09
out. Awesome. Thanks so much for having us.
27:11
Yeah, thanks for having us.
Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More