Building a Voice Agent: A Case Study by The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis | Podchaser

Episode from the podcastThe AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

Building a Voice Agent: A Case Study

Released Saturday, 19th April 2025

Good episode? Give it some love!

Building a Voice Agent: A Case Study

Building a Voice Agent: A Case Study

Saturday, 19th April 2025

Good episode? Give it some love!

Rate Episode

Podchaser Pro

Episode Transcript

Transcripts are displayed as originally observed. Some content, including advertisements may have changed.

Use Ctrl + F to search

0:00

Today on the AI Daily Brief, a

0:02

case study in building voice agents. The

0:04

AI Daily Brief is a daily podcast and video

0:06

about the most important news and discussions in AI. To

0:09

join the conversation, follow the Discord link in our

0:11

show notes. Today

0:18

we're doing something a little bit different and

0:20

that I'm very excited for. As you guys

0:23

might have heard, over the last six months,

0:25

Our team at Super Intelligent has been working

0:27

on a voice agent that is effectively the

0:29

core of a new type of automated consultant

0:31

that we deploy as part of our agent

0:33

readiness audits. Agent readiness audits

0:35

are a process whereby we go in

0:37

and interview people inside companies about A,

0:40

all of the AI activities and agent

0:42

activities they're currently engaged in, as well

0:44

as B, just their work more broadly. The

0:47

goal is to benchmark their AI and agent

0:49

usage relative to their peers and competitors. as

0:51

well as to map the opportunities they have

0:53

to actually deploy agents to get value. A

0:56

core part of how we do this

0:58

is a voice agent that we've developed

1:00

that can interview dozens, hundreds, or thousands

1:02

of people at the same time, on

1:04

their time, 24 -7, totally unlocking

1:06

a differentiated ability to capture information

1:08

than anything that consultants have previously

1:10

had. Today, we're talking with our

1:12

partners at Fractional who have been helping us build

1:15

this technology to do a bit of a case study

1:17

in what it looks like to actually build a

1:19

voice agent. It's been a really fascinating process and we're

1:21

excited to share a bit of the learning, especially

1:23

because we think that this is a technology that many

1:25

of you are probably going to deploy for your

1:27

own purposes in the months or years to come. All

1:30

right, Eddie, Chris, welcome to the AI

1:32

Daily Brief. How you doing? Doing

1:35

great. Awesome. Thanks for having

1:37

us. Yeah, this is going to be a fun one.

1:39

I mean, so this is something where we're talking about

1:41

something that you guys have built, you know, lots of

1:43

versions of we have built together. And I think that,

1:45

you know, this is a little bit different than our

1:47

normal content, because as opposed to just talking about, you

1:49

know, what's going on in markets theoretically or what people

1:51

are building theoretically, we're actually talking about something that we've

1:53

got live that we've done that we've done some reps

1:56

on. Let's put it that way. So I think just

1:58

to kick it off, maybe if you guys could give a

2:00

little bit of background on on fractional and

2:02

yourself, just so people have that context before

2:04

we dive in. Yeah, so

2:06

I'm Chris CEO co -founder here

2:08

at fractional the the basic

2:10

thesis behind the business is that

2:12

one of the biggest winners of this

2:14

whole AI Moment is going to be

2:17

non AI businesses your everyday company that

2:19

can use gen AI to improve its

2:21

operations Improve its its products and services

2:23

and that those companies need help They

2:25

especially need help from top caliber

2:27

engineers who can wrangle this magic

2:29

hallucinating ingredient into production grade systems

2:31

And so the purpose behind fractional

2:33

is to bring those engineers together

2:35

in one room, have them all

2:37

work on Jenny I projects and

2:39

learn best practices from each other and build out

2:41

the best of body engineering team in the world.

2:43

And so that's been very much the division from

2:45

day one. And it's it's going going exactly according

2:48

to plan, which is always always fun with a

2:50

startup. And I think the first time in our

2:52

entire careers where that's the case. So it's been

2:54

great. And working with you and your team on

2:56

the voice agent has been been really fun. Awesome.

2:59

And Eddie, maybe maybe we can actually injuries

3:01

you a little bit with my first question just

3:03

to set up. So I think that the

3:05

main thing we want to do today is actually

3:07

talk about what it looks like to, you

3:09

know, put, put a voice agent into production. You

3:11

know, I think we learned a, we have

3:13

learned a bunch of things. We continue to learn

3:15

things in practice, but maybe to kick off,

3:17

I think just zooming out, one of the big

3:19

questions that we always deal with when it

3:21

comes to enterprise customers, enterprises that are thinking about

3:23

AI transformation is this buy build question. Right.

3:25

And I wonder, you know, you guys are, are

3:27

front lines dealing with this. Is this even

3:30

the right way to think about things at

3:32

this point? You know, especially when it

3:34

comes to agents, is there actually like a

3:36

strict buy build hierarchy? Is everything just.

3:38

some spectrum of build. What do you think

3:40

the current state of buying versus building

3:42

is with agents, especially as companies are thinking

3:44

about what it means to even enter

3:46

the agent space? Yeah, I think

3:48

it's right that everything exists somewhere on the spectrum.

3:50

I think it's pretty rare that you have

3:52

a workflow that's a good fit or a product

3:54

feature that's a good fit for an

3:56

agentic solution where you can just go buy something off

3:58

the shelf that just works. The off the shelf stuff

4:00

is great for really general purpose productivity tools

4:02

and like, you know, things like deep research

4:04

that are sort generalized tools are

4:07

like awesome. But when it comes

4:09

to, you know, specific bespoke

4:11

workflows in your business, I

4:13

think there's a spectrum of are we building

4:15

all the way from scratch? Are we building

4:17

on top of good, powerful new primitives that

4:19

are coming into the market? Are we doing

4:21

some building work that requires just sort of

4:23

integration of off -the -shelf tools, but I think

4:25

it's rare that we see great fits of

4:27

sort of off -the -shelf tools that really replace

4:29

an existing manual workflow. Yeah,

4:32

and this has sort of been our experience as

4:34

well. Everything is to some

4:36

extent billed, even if it's only customized.

4:38

And so with that as background, you

4:40

know, you guys have now had a chance to

4:43

spend a bunch of time, you know, thinking about voice

4:45

agents, digging into voice agents. There

4:47

clearly seems to be resonance with voice agents

4:49

in the market. A lot of people

4:51

are finding a lot of different use cases.

4:54

Do you have a thesis for why that is

4:56

or what you attribute that to? I

4:58

think the technology has just gotten a lot

5:00

better and I think the applications are

5:02

obvious. Any business that has some kind of

5:04

call center or has some kind of

5:06

bottleneck in their business that is voice related

5:08

is looking in the direction of this

5:11

technology because I think the applications are broad

5:13

and obvious. And the

5:15

technology is finally there. If you have an experience

5:17

of talking to one of these things in the

5:19

wild, I've only had a few

5:21

thus far, but they're starting to become more

5:23

frequent. And every time I'm always impressed by

5:25

what a pleasant experience it is as a

5:27

consumer. And so I think we're just

5:29

going to start seeing these things pop up everywhere. Also,

5:32

Voice is just a great fit

5:34

for certain kinds of data

5:36

collection, basically. You know, I

5:38

think you'll see it in the in the

5:40

use case. We're to dive into a minute with

5:42

Super's use case. You know, there's a reason

5:44

why when you go to do research about what's going

5:46

on inside of a big company, one of the

5:48

things you do is you go in and you interview

5:50

people and you ask them questions instead of just

5:52

like sending them a survey, you know, that sort of

5:54

fixed data entry kind of task is not a great

5:57

fit for a lot of kinds of situations where

5:59

you want big open -ended responses and you want

6:01

people to serve ramble and and, you

6:03

know, realize thinking on the fly, things

6:05

like that happen really naturally over voice.

6:07

And to Chris's point, finally, the technologies

6:09

at a place where we can start to chip

6:11

away at the kind of stuff that only a human

6:13

interviewer could have done before. Yeah. I

6:15

mean, I think it's interesting. So

6:17

for, for backgrounds, we're going to talk

6:19

about, you know, the voice agent

6:21

that we've been collaborating on is this

6:23

sort of data collection experience, right?

6:26

It is meant to capture information around

6:28

people's current workflows, their current AI,

6:30

you know, adoption techniques in order to

6:32

help us give them recommendations around

6:34

what agent opportunities they have. That's the

6:36

core idea. And the starting point,

6:38

the central sort of genesis of this

6:40

was that. A, to your point,

6:42

Chris, the technology was such

6:44

that it actually just is good enough to

6:46

do this, right? You can actually have an agent

6:48

interview people and it does a pretty good

6:50

job. You know, not off the shelf, as we'll

6:52

see. You know, we had to do a

6:54

lot of kind of development to make it work.

6:56

But still, the capabilities are there. The second

6:59

piece, and I think this is the piece that

7:01

you were speaking to, is it is actually

7:03

not just as good an experience as the human

7:05

equivalent. There is a lot to

7:07

recommend this as a better, an actual, just

7:09

factual, better experience. First, the fact that

7:11

you can collect information with voice and having

7:13

people talk instead of people type, just

7:15

instantly, it's so much easier for many, many

7:17

people, if not most people to ramble

7:19

about something and just speak at it, then

7:21

to sit down, try to collect their

7:23

thoughts, try to structure it and type it.

7:25

And it's faster, no matter what, right?

7:27

You can get just the amount of information

7:29

per unit of time. And it's going

7:31

to be way, way higher if you're, if

7:33

you're having people talk. So that's one.

7:35

Second, the ability to do that on demand,

7:37

on your own schedule, whenever you are, maybe

7:40

if you're walking to work, whatever, like

7:42

four AM at night, when you can't wake

7:44

up as opposed to having to schedule

7:46

a human interview is again, just a, that's

7:48

not a one X improvement. That's a.

7:50

10x improvement and convenience of something. And so

7:52

I think those two things combined, both

7:54

the fact that the technology is there and

7:57

it's actually just a better potential experience

7:59

makes a huge difference. You know, certainly that's sort

8:01

of like what the insight was that when we had going

8:03

into it. Yeah. In addition to

8:05

that, you don't have to hire out a team

8:07

of thousands of consultants in order to conduct the

8:09

kind of interviews that you guys want. Yep.

8:12

In fact, it's interesting to, uh, you know, maybe

8:14

to come back, come back to this, but

8:16

you know, I've had a lot of conversations with

8:18

consultants after having, having built this. And on

8:20

the one hand is fairly disruptive to at least

8:22

a piece of what they're trying to do,

8:24

right? This is something that consultants bill lots and

8:26

lots of money for to do this data

8:28

collection. Interestingly, what I

8:30

keep coming across is

8:32

consultants don't see their

8:34

value, their primary value

8:36

as collecting information. It's

8:39

like the proprietary knowledge and experience they have, the

8:41

way that they analyze it. So they're actually extraordinarily

8:43

bullish. Like they don't want to have

8:45

to force their customers to use a huge

8:47

portion of their budget. Just in the

8:49

data collection, they'd much rather have that be

8:51

able to go to the actual processing,

8:53

the analysis, what they do next with it.

8:55

Right. So even though this sort of

8:57

piece is actually theoretically disrupted by, I think.

8:59

think it's likely to shape how we

9:01

see that industry evolve as well. I

9:03

think there's also just a whole breadth of

9:05

insights that are probably not being captured in a

9:07

lot of those consulting scenarios just because you're

9:10

limited by only being able to do whatever, 10

9:12

interviews or something like that. Whereas what could

9:14

you learn if you could actually do 1 ,000

9:16

custom interviews in parallel and be able

9:18

to actually process the data coming

9:20

back from that? Yeah, the

9:22

point about this is not what the consultants

9:24

want to be doing too. It's like that

9:26

that is something we see broadly across basically

9:29

every project that we do. It's the things

9:31

that it's the repetitive work that takes away

9:33

from the higher order tasks that you want

9:35

to get to on your to -do list and

9:37

don't have time to get to that AI

9:39

is so well suited for and very often

9:41

we find that exact kind of dynamic. we're

9:43

automating away the things that people People just

9:46

the banger banger head against the wall do

9:48

this a bunch of times and it's not

9:50

super intellectually stimulating that kind of stuff. We

9:52

can delegate whether that's voice or or

9:54

text and free up people to do

9:56

higher order tasks. Awesome. Well,

9:58

let's let's dive in and talk about what it what it looks

10:00

like to actually build a voice agent in practice and

10:03

what we've learned. So Eddie, you know, I'm not sure

10:05

exactly what the right place to start is, but I'll

10:07

let you take it away from from here and and

10:09

dig into it. Yeah, absolutely. So,

10:11

you know, I think you

10:13

sort of called out correctly earlier that like

10:15

the technology is there But that

10:17

doesn't mean it just works off the shelf or that you

10:19

don't need to do a bunch of custom work here. And

10:22

so the technology in this use case that

10:24

we really leaned on to build this interview

10:26

agent. And by the way, the way this

10:28

agent actually works in practice is we configure

10:30

it with sets of interview questions and goals.

10:32

So here are the things we want the

10:34

person to be asked. Here are the reasons

10:36

why we're asking them. We prioritize those goals.

10:38

And that's kind of the input to this.

10:41

very agentic system that is then in

10:43

charge of deciding how exactly do I phrase

10:45

these questions? When do I follow up?

10:47

What do I ask next? When have I

10:49

met my goals? And so

10:51

it's got a lot of agency.

10:53

It's highly sort of undirected. And

10:55

the kind of out -of -the -box technology

10:58

that we have access to right now,

11:00

and there's a few different alternatives here,

11:02

but the one we chose for this

11:04

project was the OpenAI real -time API, which

11:06

has great real -time voice capabilities. It's

11:08

got nice realistic voices that sound

11:10

pretty human, and it's pretty smart in

11:12

its ability to make decisions on the fly.

11:15

If you just give a monolithic prompt to

11:17

that model that tells it about the

11:19

interview and the questions it might want

11:21

to ask, you get a pretty cool

11:23

result, but it goes off the rails

11:25

all the time. It asks weird questions.

11:27

It's hard to tune when it follows

11:29

up. If your only mechanism for control

11:32

here is a giant monolithic prompt, your

11:34

hands are really tied. And so

11:36

we quickly found that while it ran some

11:38

interviews well, it ran some interviews really poorly, and

11:41

our control over what happened next was

11:43

pretty limited. And so one

11:45

of the areas where it fell down

11:47

was... It didn't always make smart choices about

11:49

what question to ask when. We would tell

11:51

it all the questions up front. It would

11:53

be up to it to decide which one

11:55

is next. And so we ended up doing

11:58

is abstracting out an entirely out of band

12:00

sub agent that's running in

12:02

parallel in the background, assessing the

12:04

conversation. And its whole task is like,

12:06

if we were to move on to another question right

12:08

now, which one should we move on to? And

12:10

then the core agent is just told, here's

12:12

the one question we're working on now in the goals. So

12:15

it's like one example of how we had to

12:17

take this thing you know, from going off the rails

12:19

and getting it back on. Another thing we added was this

12:21

sort of, we were calling it the drift detector sub agent. I

12:23

think for a while we were calling it the rabbit hole

12:25

detector. Like these LLMs are

12:27

just so, you know, eager to please. They're

12:29

really like, they have, anyone who's interacted

12:31

with LLMs a lot like knows the personality

12:33

of one, right? And

12:35

so we kind of were like stuck

12:37

where We want it to ask follow -up

12:39

questions. We don't want to constrain

12:41

it to never ask follow -up questions. But if

12:44

you give it a little bit of rope, what

12:46

it ends up happening is, no matter what

12:48

you say, it's like, wow, your job is

12:50

so interesting. That's crazy. Tell me more about that.

12:52

Just sort of dig and dig and dig. And

12:55

so what we end up doing was

12:57

adding this whole side flow that's watching

12:59

the conversation and just sort of assessing,

13:02

all right, has this thing gone off the rails? Are

13:04

we going down the right path? Should we force?

13:06

under the hood, a tool call to force like more

13:08

moving on to the next question. So there's

13:10

a bunch of these sort of like subcomponents that

13:12

go into what feels like an overall large, agentic experience,

13:14

actually a bunch of sort of subcomponents. They're like

13:16

one of the more surprising ones, maybe anyone that's worked,

13:18

worked deep in the weeds on voice has seen

13:20

this before, but I think this is surprising to a

13:22

lot of people. The one

13:24

of the things we wanted to do here was

13:26

show a pleasant UI. And so that, that actually

13:29

added a bunch of constraints. One constraint was You

13:31

need to actually know what question is being asked

13:33

so you can show a little check mark on

13:35

the screen. You need to know what you're

13:37

planning on moving on to next. So this actually adds

13:39

quite a bit of complex standard of the hood. One

13:41

of the areas where this impact

13:43

of things was showing transcripts.

13:45

So we want to show a

13:47

written transcript of what's happened so far. In fact, we even want

13:50

to enable the user to interact over text if they want

13:52

to. The OpenAI models actually make

13:54

this really nice. They return with

13:56

a JPI response, both the audio

13:58

follow -up and the... what's happened

14:00

so far. The problem is that

14:02

transcript is like produced by a separate

14:04

model that's whisper running on the side, just

14:06

doing basic sort of speech to text. And

14:09

the core model and the transcript model

14:11

can disagree with each other. I

14:13

think you actually might have had the experience where you were

14:15

like on one of these interviews and there was like

14:17

a sneeze or a cough or something. And I think the

14:19

core model did the right thing. It was like, bless

14:21

you. But the output Of the

14:23

transcription was just like something that represented the underlying training

14:25

data randomly like it would it said like don't

14:27

forget to like and subscribe or like it would come

14:30

out in Korean or something like that Yeah,

14:32

we had a lot of like random background

14:34

noise turns into foreign language switches Yeah, yeah,

14:36

totally. So there's a lot

14:38

that went into into kind

14:40

of keeping this thing on the rails

14:42

One of that outcomes of this is that

14:44

you now have like a lot of different

14:47

knobs and levers You can adjust the core

14:49

prompt. You can adjust what model you're using.

14:51

You can adjust the questions you're asking. You

14:53

can change the wording of the goals and

14:55

the large number of degrees of freedom. I

14:57

mean, it's nice because you now have good

14:59

primitives to control your interviews, but it's scary

15:01

because, you know, kind of anything can happen

15:03

and you don't want to test that in

15:05

front of users. For all of these, these

15:07

are AI projects generally, like

15:10

it's absolutely critical early in your

15:12

development process to build strong

15:14

evals, you know, some automated way.

15:17

of producing metrics to tell you how well you're

15:19

performing and all the sort of key things you want

15:21

to know about your problem. This one

15:23

is just so hard. Like it's voice, it's

15:26

open -ended. There's no

15:28

really like great source of ground truth.

15:31

Like I don't even know, did you think at

15:33

all early in the project what ground truth would

15:35

look like? I mean, to me, I'm like, could

15:37

we collect a set of recordings of human interviews?

15:39

And even if we did, I don't even know

15:41

what we would do with that. Yeah. I mean,

15:43

so to maybe reframe the question and just sort

15:45

of super simple language, what does a good interview

15:47

sound like look like feel like it's inherently it

15:49

turns out once you dig in it's like wow

15:51

that's really subjective because it's like is it a

15:54

good interview because it got good information is it

15:56

a good interview because it was prompted it didn't

15:58

drag you too long is it a good interview

16:00

because you know people didn't have to repeat

16:02

themselves as you know it's all of

16:04

these things that it could be and

16:06

you add on top of that the

16:08

sort of layer of just human variability

16:10

like we're you know we are live

16:12

right now for example with a major

16:14

pharmaceutical company with every single person in

16:17

a department 250 different person doing the

16:19

same interview, what's good to them is

16:21

highly variable already before you get into

16:23

just on a human preference standpoint. So

16:25

yeah, I think this is actually an

16:27

enormously challenging thing. I think one of

16:29

the things that we sort of, one

16:31

of the places that we went,

16:33

I know you're going to take it

16:35

in a different direction with evaluation,

16:37

but even going back to the sort

16:39

of the way that the experience

16:41

developed over time. is we added more

16:43

knobs basically made the experience more

16:45

controllable basically that's sort of a shortcut

16:47

to making the user experience better

16:49

is giving the user more ability to

16:51

modify the experience right so you

16:54

know at your point at the beginning.

16:56

Like, if you're very open -ended, in fact, a great

16:58

use case that I would encourage people to play around

17:00

with voice agents for, the more that

17:02

you're down to kind of just let the

17:04

AI wander, you can get some really

17:06

interesting stuff, right? For us, we're pretty constrained.

17:09

We really needed a set of questions

17:11

to get answered. And, you know,

17:13

there was some amount of sequencing

17:15

that was important. And so we ended

17:17

up, one of the big sort

17:19

of moments for us, I think, with

17:21

this particular project was creating an

17:23

interface experience where people could jump

17:26

from different questions to questions. So, you know, we

17:28

had already added a skip or a, you know,

17:30

stop kind of button, but we wanted to go

17:32

even farther. We felt like we had to go

17:34

even farther, which was just like, I want to

17:36

look at all the questions, say, I

17:38

don't care about all these, but I do want to

17:40

answer that one. And so, you know, there's a bunch

17:42

of different ways to answer it, but it, you know,

17:45

it becomes a product design process very, very quickly. It

17:47

turns out. Yeah. And like,

17:49

you want to know, like to

17:52

your point about. what even makes

17:54

a good interview. Like

17:56

you want to know in a lab setting that you're

17:58

going to have good interviews. Like I think your question

18:00

earlier about when do you build, when do you

18:02

buy? Like actually voice agents are an area where

18:04

there's tons of great tooling coming out that like

18:06

this is company Bland AI that jumps to mind

18:08

that they like make a great product for designing

18:10

voice agents. Like they make it really easy to

18:12

put a voice agent on the phone to design

18:14

conversational flows, etc. But I think

18:16

it's that what we see in terms of adoption

18:19

is the adoption is happening in places

18:21

where people are kind of willing to learn

18:23

on the fly from real user conversations when

18:25

it went off the rails. And

18:27

the sort of tooling out there for making

18:29

sure in a lab setting that you're confident

18:31

that when I go send this into a

18:33

Fortune 500 company to do interviews, I'm not

18:35

going to do anything stupid. And

18:37

just getting that confidence is really, really

18:39

hard. What we ended

18:41

up doing on this one was we

18:43

built this whole separate system for creating

18:46

synthetic conversations where we collect all

18:48

these sort of written personas of the

18:50

types of real people we think

18:52

we would interview. This is a person

18:54

in marketing and here are the tools they use, here the

18:56

people they interact with, all sorts

18:58

of things like that. We write out

19:01

this persona and then we have a

19:03

separate LLM play the role of fake

19:05

customer. We conduct these interviews in the

19:07

text domain where over text, our agent

19:09

is interviewing this fake user and then

19:11

we're measuring a bunch of stuff about

19:13

the conversation afterward. you had asked

19:15

earlier what makes a great conversation. We spent a

19:17

lot of time on this one trying to define

19:19

that. And we ended up

19:21

with all of these metrics we produced. And

19:24

they're all imperfect. With all these eval

19:26

sorts of questions, you have to find the

19:28

80 -20 on, I don't want to spend

19:30

all of my time developing some perfect

19:32

lab metric for what makes perfect conversation. Because

19:34

there's so much stuff you won't know

19:36

until you go into the wild. I

19:39

think we had this experience where someone just started talking

19:41

to it in German in the middle of the conversation.

19:44

Luckily it just worked, but we wouldn't have guessed that one

19:46

in a lab. Yeah, you know,

19:48

and like adding complexity to this, just to

19:50

the extent that, you know, I think

19:52

my sense is that we've learned a lot

19:54

of things, we've solved a lot of

19:56

problems, but then there's new problems that come

19:58

up. One that I think is a

20:01

continued challenge with the evaluations are we have

20:03

this great, you know, a great suite

20:05

tool for testing for kind of like seeing

20:07

how different personas might interact. But

20:09

the AI still defaults to assuming

20:11

that all those personas will in

20:13

good faith engage for the time

20:15

it takes to finish the interview.

20:17

Whereas like within the first three

20:20

interviews that we tested, a

20:22

CEO started swearing at the thing like

20:24

halfway through, you know, question four and dropped

20:26

out. By the way, he ended up

20:28

coming back and it was a very useful

20:30

interview. And so was all worked out

20:32

fine. But like the AI was not the

20:34

synthetic testers did not think to storm

20:37

out of the room. as part of their,

20:39

as part of their tests based on

20:41

their personality. Yeah. I don't know if,

20:43

if you've ever done this, but sometimes I just

20:45

have fun going into chat, GPT and trying to, trying

20:47

to get the last word and it never happens.

20:49

Right. You say, okay, bye. And it's like, all right,

20:51

see you. Uh, everything's fine. They don't give up. I

20:54

do think though, like the, the

20:56

tuning of the underlying, like normally you

20:58

use these evals just to build

21:00

the software. It's like you're writing a

21:03

custom workflow. where you know

21:05

reasonably well what good looks like. And

21:07

then the question is, is our system

21:09

good? Here, you're also

21:11

designing an interview while you design the

21:13

system that can support interviews. And

21:15

the number of degrees of freedom is

21:17

super, super high. I think that's

21:19

common across anything voice and anything that

21:21

is conversational. The developers

21:23

working on chat, GPT,

21:26

have their work cut out for them to

21:28

figure out, are we having good conversations?

21:30

Do we mess up? Those are like really

21:33

fuzzy things to measure. Yeah,

21:35

you know, and I think too, one of the

21:37

one of the experiences learnings for me is

21:39

which is helpful, especially because our use cases literally

21:41

helping people figure out where to, you know,

21:43

deploy agents or which which agent use cases to

21:45

think about. We really are, you

21:47

know, there's all all sorts of different

21:49

definitions of what exactly an agent means. But

21:51

I tend to come back to the

21:53

very, very kind of clear and simple way

21:55

that I think enterprises think about it,

21:58

which is AI is stuff that I use

22:00

to make. my work better agents are

22:02

stuff that you know things that do the

22:04

work for me and that is very

22:06

crisp and clean in the context of this

22:08

voice agent where we are handing over

22:10

a customer. to it to ask a bunch

22:12

of questions with information that we need

22:14

to get with no ability to intervene if

22:16

it goes off the rails or doesn't

22:18

do a good job or you know like

22:21

we're just it's a small thing it's

22:23

you know it's not all that risky but

22:25

ultimately we're letting the agent do the

22:27

interview and it really is a clearly different

22:29

thing than you know us using chat

22:31

gbt to help prep for an interview or

22:33

something like that and it turns out

22:35

and eddie i think this is sort of

22:37

part of your point literally as soon

22:39

as you are allowing a thing to go

22:41

do the thing, the degrees of freedom

22:44

just become so much more immense than the

22:46

normal software experience. And even in a

22:48

relatively constrained environment, like there's 20 questions that

22:50

we really need you to answer. Yeah,

22:52

I think a question on like everybody's

22:54

mind right now is like. What is an

22:57

agent like everybody's got this separate definition

22:59

a separate way of framing the problem and

23:01

and it's just like a hot topic

23:03

in conversation right now I think we both

23:05

agree that this this one is a

23:07

highly agentic kind of example in a fairly

23:09

obvious way I think we tend to

23:12

think of like agency as being this sort

23:14

of spectrum like there are less agentic

23:16

things that are more agentic things and like

23:19

there are a few sort of sub attributes that

23:21

lead to something feeling more agentic. And like,

23:23

you know, one sort of element here is how

23:25

open -ended is the task? Like here it's completely

23:27

open -ended, right? Like you're given an interview, but

23:29

you're, you can really vary what you're doing. Another

23:32

is like how complex is it? You know,

23:34

we have some open -ended tasks, but it's

23:36

like the task is spam detection. It's like

23:38

the eventual result is like, you know, is

23:40

this spam or is this not? This one

23:42

is super open -ended. You have very broad goals

23:44

you're defining. And then the last

23:47

one is sort of like, I think what you

23:49

were sort of talking about a second ago, which is

23:51

who's taking the action at the end of all

23:53

of this? You know, is there some system that's behind

23:55

the scenes, eventually making a recommendation to a person? In

23:58

this case, no, right? Like there's nobody sitting there

24:00

watching the interview that the person doesn't even get

24:02

involved until you're reviewing the results of the interview

24:04

and trying to synthesize it. Even then, I think

24:06

like that's in the to do list to start

24:08

to tackle next, right? We're going to keep moving

24:10

through that and see how many places we can

24:12

apply agents in this process. So

24:14

as we kind of zoom out. having

24:17

gone through this experience, and obviously you're

24:19

bringing to bear tons and tons of different

24:21

projects at the same time, what

24:23

does this make you think around? Are

24:25

there other use cases that you're excited

24:27

about for voice agents, where you think that

24:29

companies should be really thinking about these

24:31

things? And maybe that's either specific use cases

24:33

or just types of problems or types

24:36

of opportunities that you think they're particularly well

24:38

suited for. Yeah, I think

24:40

inbound phone calls, and especially

24:42

within that spectrum, generally what you're

24:44

looking for is What's the

24:46

50 % of call volume that

24:48

is for very simple tasks? And

24:51

start with that with the ability to escalate

24:53

for the more complex things. So

24:55

that's one bucket. Another bucket

24:57

is outbound B2B calls. So

24:59

things like calling insurance companies to get, you

25:01

know, to gather information. That's

25:03

another big bucket. In general,

25:05

one of their best practices with this is, you

25:08

know, you always want the person who's talking

25:10

to the agent to know they're talking to an

25:12

AI agent and not to pretend that it's

25:14

a human. I think people are

25:16

very forgiving with being on the phone with AI

25:18

agents and they tend to be very positive

25:20

experiences, but I can imagine the hiding it

25:22

from a person would be a very bad, open

25:24

yourself up to a very bad experience. If

25:26

I just think back to my last week, what

25:28

I've seen in voice agents, they're all over

25:31

the place, and they're all super interesting in their

25:33

own way. We see folks

25:35

in health care that are currently doing

25:37

a bunch of... It's very similar to

25:39

your use case. It's someone conducting interviews

25:41

today. It's someone interviewing a bunch of

25:43

physicians to do market research. I

25:45

think it's open -ended, whether the right

25:47

answer there is such a regulated place

25:49

to allow a voice agent to do

25:51

that, or if the voice agent's riding

25:54

shotgun and providing suggestions. But in

25:56

either case, seems like it can help

25:58

there. We've seen folks in the rail industry,

26:00

you know, going on trains doing safety

26:02

sort of inspections where, where like, they're trying

26:04

of trying to take notes on an

26:06

app today. and it's like super awkward. They're

26:08

like on a train interviewing a conductor,

26:10

talking out loud to them, but also trying

26:12

to take notes. And it's just a

26:14

bad UX. And so the agents sort of

26:16

guiding that is potentially a better experience. A

26:18

technician who's on site and needs to

26:20

refer to an instruction manual for this

26:22

big complicated piece of machinery. And instead

26:25

of trying to flip through the manual,

26:27

they could maybe interact via voice. Awesome.

26:29

Yeah. I I mean, certainly I think

26:31

our experience has been immensely positive. Like I

26:33

said at the beginning, this is not

26:35

a one or two X improvement over the

26:38

alternative. It is a massive, you

26:40

know, it's, it's you can't even even really calculate

26:42

it. Like it is, it was not possible

26:44

before to interview every single person in a company

26:46

about what they do and try to map

26:48

agent opportunities. It is now possible. Theoretically, if they

26:50

all did it at the exact same time,

26:52

it could all happen, you know, in a half

26:54

an hour. So, you know, we're super excited.

26:56

We love working with you guys on this. You

26:58

know, we're excited that more and more companies

27:00

are interacting with it, giving us more context to

27:03

learn from. Really appreciate the time today as

27:05

well to share it and excited to bring you

27:07

guys back as we continue to build this

27:09

out. Awesome. Thanks so much for having us.

27:11

Yeah, thanks for having us.

Rate

Get this podcast via API

From The Podcast

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

A daily news analysis show on all things artificial intelligence. NLW looks at AI from multiple angles, from the explosion of creativity brought on by new tools like Midjourney and ChatGPT to the potential disruptions to work and industries as we know them to the great philosophical, ethical and practical questions of advanced general intelligence, alignment and x-risk.

Join Podchaser to...

Rate podcasts and episodes
Follow podcasts and creators
Create podcast and episode lists
& much more

Episode Tags

Do you host or manage this podcast?
Claim and edit this page to your liking.

,

Unlock more with Podchaser Pro

Audience Insights

Contact Information

Demographics

Charts

Sponsor History

and More!

Pro Features

Resources
Help Center
Blog
API

Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More