Episode Transcript
Transcripts are displayed as originally observed. Some content, including advertisements may have changed.
Use Ctrl + F to search
0:00
Welcome to the Analytics
0:02
Power Hour Hour. Analytics
0:05
topics covered conversationally and
0:07
sometimes with explicit language. Hey
0:09
everybody, welcome. It's the Analytics
0:11
Power Hour Hour and this
0:14
is episode 262. Hey, happy
0:16
New Year. You know, 2025.
0:19
That'll probably be the year
0:21
of, well, what exactly? And
0:23
there is a pretty steady
0:26
flow of prognostications every year
0:28
about the things that will
0:30
define the coming year. And
0:33
we're not, you know, completely
0:35
immune to desire to define the
0:37
future. I didn't say that very
0:39
clearly, but we do want to
0:41
define the future. So what will
0:44
2025 bring? It's probably the year
0:46
of Tim Wilson still being frustrated
0:48
with people calling stuff the year
0:50
of... That's fair. Probably accurate, yeah.
0:53
You can try to end with
0:55
Tim being frustrated with people. You
0:57
don't really need to say, oh,
0:59
there you go. Further qualifiers on
1:01
it. Not necessary. We still like
1:04
you. And 2025, probably be the
1:06
year of Mo, still liking Adam
1:08
Grant and Bernay Brown. Hey, Mo.
1:10
Yeah, probably, actually. That's a
1:13
very good prediction. There's
1:15
going to be a huge scandal with
1:17
one of them between like recording and
1:19
that coming out. Oh, jeez. It's going
1:22
to be. All right. Well, and I'm
1:24
Michael Helbling. Well, some attempts at categorizing
1:26
the future that is coming at us
1:28
awfully fast is definitely warranted. So what
1:30
better time than the first episode of
1:33
2025? You know, insert Zagger and Evans
1:35
pun here. And to do this right,
1:37
we wanted to have a guest who
1:39
has a great track record of observing
1:41
our industry and seeing where the puck
1:44
is going. Barb Moses is the co-founder
1:46
and CEO of Monte Carlo, the data
1:48
reliability company. As part of her role
1:50
as CEO, she works closely with data
1:53
leaders at some of the foremost AI-driven
1:55
organizations like Pepsi, Roche, Fox, American Airlines,
1:57
hundreds more. She's a member of the
1:59
four Council and is
2:02
a returning guest of the show.
2:04
Welcome backbar. Thank you so much. I
2:06
am honored and pleased to be a
2:08
returning member. No, we're serious. We
2:11
love the way that you take
2:13
such an interest in really having
2:15
from your level a real good
2:17
clear view of where our industry
2:19
is in the data industry is
2:21
going. Before we get started, let's
2:24
just get a recap of what's
2:26
going on with you and Monte Carlo.
2:28
Yeah, it's been a world in a
2:30
couple of years for not only for
2:32
more any Carlo, but I'd say
2:34
for the entire data industry. Like
2:36
I'm just reflecting last time I
2:38
was here, this was 2021. It's
2:40
just kind of, you know, coming
2:42
out of COVID, I think we,
2:44
you know, we're all getting comfortable
2:46
behind the camera and feeling comfortable
2:48
at home and, you know, the
2:50
world is obviously very different today,
2:52
but maybe just kind of a
2:54
quick recap. You know, Monte Carlo
2:56
was founded to solve the problem
2:59
of what we call data downtime,
3:01
periods of time when data is
3:03
wrong or inaccurate. And, you know, five, ten
3:05
years ago, that actually didn't seem
3:07
important at all. Like, I think people
3:09
spent some time thinking about quality of
3:11
data, but you know, you guys know this better
3:13
than I do, but it probably didn't
3:16
get the diligence that it deserved back
3:18
then. Like, you could kind of like
3:20
skirt around the issue. You probably, you
3:22
know, it was very common at the
3:24
time to just have like... extra eyes
3:26
on the data to make sure that
3:28
a report is accurate. And if it
3:30
was wrong, you kind of me like,
3:32
oh, shocks, so sorry, and kind of
3:34
like move on. I also, but sorry
3:36
to interrupt, but I also think it
3:38
maybe wasn't as complex. And so like,
3:40
you know, as complexity has grown that
3:42
the ability to troubleshoot and dig
3:44
into the why it's not reliable
3:46
is even harder. But, sorry to
3:48
break your stride. Not at all. No,
3:50
I think that's spot on. And maybe just
3:53
to unpack that a I think it was
3:55
less complex because one, the use cases were
3:57
limited, right? So today we call a data
3:59
product. and very fancy names for, you
4:01
know, but the use case was maybe
4:04
just revenue reporting to the street, right?
4:06
And, you know, the, so these cases
4:08
were fewer, the timelines were fewer, so,
4:10
you know, you maybe use data like
4:13
once a quarter to report the numbers,
4:15
and also there were fewer people working
4:17
with data. So maybe it's like a
4:20
couple of analysts under the finance team,
4:22
and so you really had a lot
4:24
more time, less use cases, less complexity,
4:26
in which, and the stakes were lower.
4:29
Right? And so in all of those
4:31
instances, like, it kind of didn't really
4:33
matter if the data was accurate or
4:35
not. And then there was this big
4:38
wave of actually, like, people starting to
4:40
use data. Remember when people would say,
4:42
oh, were data driven? And you kind
4:44
of, like, didn't really believe them. That
4:47
whole, there was a period back in
4:49
time, you know? And still happening. Still
4:51
happening. Totally agree with you. So I
4:53
think there was this like, you know,
4:56
big push and that's sort of when
4:58
Monty Carlo created the category of data
5:00
observability, which is basically allowing people creating
5:02
data products, whether those are data engineers,
5:05
data analysts, data scientists, anyone working with
5:07
data to make sure that they are
5:09
actually using trusted reliable data for that.
5:11
And sort of, you know, kind of
5:14
like helping when someone's looking at the
5:16
data and like what WTO the data
5:18
here looks wrong. you know, helping those
5:20
people come answer the question of what's
5:23
wrong and why. That was sort of
5:25
kind of like the reason how Monica
5:27
was born. Now fast forward today, I
5:29
can't believe it's almost 2025, it's like
5:32
four years since. You know, I like
5:34
to say that I think the data
5:36
industry a little bit like Taylor Swift,
5:39
we kind of like reinvent ourselves every
5:41
year. We need to like an era's
5:43
tour and kind of like go through
5:45
all the, you know, periods of time
5:48
of the data industry. And I think
5:50
the most recent era being swept by
5:52
generative AI, the implication of that means
5:54
that bad data is even worse for
5:57
organizations. And we can unpack what that
5:59
means, but at a very high level,
6:01
what Monte Carlo does. help organizations, you
6:03
know, enterprises, make sure that the data
6:06
that they're using to power their pipelines,
6:08
power their dashboards, power their generative AI
6:10
applications, is actually trusted and reliable. And
6:12
we do that by first and
6:14
foremost knowing when there's something wrong,
6:17
right? Like knowing if the data
6:19
is late or inaccurate, but then also
6:21
being able to answer the question
6:23
of why is it wrong? And how do
6:25
actually resolve an issue? I'll sort of pause
6:28
there, sort of a long answer and
6:30
a lot more that we can go
6:32
into, but whoa, it's been a fun
6:34
couple of years. Well, but also,
6:37
I mean, one, I guess just
6:39
to clarify, we're not saying that
6:42
in 2021 people weren't using data.
6:44
I mean, that's been ramping up
6:46
for a while. I think also
6:49
the modern data stack, I'm not
6:51
sure where that phrase was in
6:53
the... inflated expectations versus like it
6:56
definitely I feel like since the
6:58
last time you were on the
7:00
modern data stack is a phrase
7:02
has slid into the trough of
7:04
disillusionment at least a little bit which
7:06
is kind of interesting I don't know
7:08
exactly how that applies to kind of
7:10
where we're going from here but I
7:12
feel like there was a point where
7:14
it was like if we just have all
7:16
these modules plugged in together with
7:18
the right layers on top of
7:20
them then like all will be
7:22
good and it feels like we're
7:24
we're a little past that that
7:27
that that nirvana even if we
7:29
got there wouldn't actually
7:31
necessarily yield the
7:34
the results that were being promised
7:36
but yeah I mean I think Look,
7:38
putting myself in sort of the shoes
7:41
of data leaders today, you're facing a
7:43
really tough reality because like every 12
7:45
to 18 months you're being thrown at
7:47
with sort of a new concept. Call
7:49
it modern data platform, call it generative,
7:51
call it whatever you want. You're sort
7:53
of expected to be on top of
7:55
your game and sort of understand the,
7:58
you know, word or trend du jour. I
8:00
think if you sort of unpeel that
8:02
for a second and go back to
8:04
fundamentals, there are a couple of things
8:06
that I think remain true regardless and
8:09
have remained true for the last 10,
8:11
15 years, which is first and foremost,
8:13
like organizations want to use data and
8:15
data as a competitive advantage. How you
8:18
use it and in what ways, like
8:20
I think that is undisputable. Like strong
8:22
companies have strong data practices and use
8:24
that. to their advantage. You can talk
8:26
about how, for example, you can use
8:29
it for better decision-making internally. That was
8:31
sort of one of the dominant use
8:33
cases in the beginning. You can use
8:35
it to build better data products. Like,
8:38
for example, you can have a better
8:40
pricing algorithm. And I think today, you
8:42
can talk more about this, but I
8:44
think data is the most. for generative
8:46
AI products and innovative solutions. And so
8:49
regardless of where the hype cycle is,
8:51
I think one core truth is that
8:53
data matters to organizations. Will we do
8:55
matters? And so data continues to be
8:58
a core part for organizations. I think
9:00
the second sort of fundamental truth that
9:02
we believe in is like reliable data
9:04
matters. Like the data is worthless if
9:06
you're working with. Yeah, you know, like
9:09
it's, this even goes without saying, but
9:11
like having something that you can trust
9:13
in is sort of fundamental to your
9:15
ability to deliver it. And then I
9:17
think the third thing that sort of
9:20
always arraign true is like innovation matters.
9:22
Like you have to be at the
9:24
forefront and so organizations that are doing
9:26
nothing about generative AI or doing nothing
9:29
to kind of, you know, learn what's
9:31
next will be at a difficult position.
9:33
I'm curious for your takes for your
9:35
takes. you know, benefits of that was
9:37
that data leaders were met with many
9:40
solutions for many problems, but actually were
9:42
inundated with perhaps too many solutions. And
9:44
so ended up in a position where
9:46
they had to make bets on a
9:49
variety of solutions and ended up with
9:51
maybe sort of a proliferation of tools.
9:53
And now there's a big movement to
9:55
actually consolidate that or cut back to
9:57
what's necessary. And so if you're not
10:00
solving a core... fundamental truth, then
10:02
you probably don't deserve to
10:04
live in the modern data
10:06
stack, if that makes sense.
10:08
You don't deserve to live
10:11
in the modern data stack.
10:13
I sorry. I so deeply
10:15
love when the podcast intersects
10:17
with things that are like
10:19
completely churning through my brain
10:21
at the moment. And it
10:23
is like this beautiful like
10:25
chef kiss because these are
10:27
all kind of concepts that
10:29
I've been giving a lot
10:32
of thought to over the
10:34
break. I want to dig
10:36
into what you mentioned data
10:38
can be a moat. Can you can
10:40
you say more about that especially
10:43
you said I think relative
10:45
to gen AI? Yeah for sure I'm happy
10:47
to. So I think you know what's
10:49
happened to Let's sort of
10:51
think about like the last, the last
10:54
I want to call a year or
10:56
two in generative AI. I'll actually
10:58
start by sharing a survey that
11:00
we did that I thought was
11:02
really, really funny. We basically interviewed
11:05
a couple hundred data leaders
11:07
and asked them what percentage
11:09
of data leaders are building
11:11
the generative AI. Can you guess
11:14
what percentage of data leaders?
11:16
Probably all of them are
11:18
saying that they are at least.
11:20
Really. Yeah, so like I think
11:22
like 97% like not a single
11:24
person. Yeah, that's you're just spot
11:27
on Michael Oh No, we're all
11:29
doing it for sure all doing it.
11:31
We're all doing it. We're all doing
11:33
it. Everyone 25 is the
11:35
year of maybe building with AI
11:38
maybe Maybe we're all doing it
11:40
right? How often do you
11:42
do a survey and get
11:44
almost 100% response rate right
11:46
like for a question? It's
11:48
pretty outlier Second question that we
11:50
asked was what percentage of you are like
11:53
do you feel confident in the data that
11:55
you have like do you trust the data
11:57
that you have that's running it? What do
11:59
you think? is what percentage of people
12:01
trust the data that they're using for
12:03
gen AI? 70%. That's not bad. It
12:06
was a 70? Okay. Because usually the
12:08
Duke business school used to do a
12:10
CMO survey every year and they would
12:13
ask data questions. like that and there
12:15
was usually about a 60% gap between
12:17
how important it is versus how much
12:20
they trusted it. It was always a
12:22
very big delta. So yeah. That's exactly
12:24
right. So 60% said they don't trust
12:26
it. So I think that's exactly the
12:29
delta. So only one out of three
12:31
trust and two out of three don't
12:33
trust the data. So it's interesting that
12:36
everyone is building generative AI, but no
12:38
one has the core component to actually
12:40
deliver a sad generative AI. I think
12:43
that speaks more to kind of human
12:45
nature, right? And what we want to
12:47
be, where we are. Can I ask,
12:49
this concept has been rolling around and
12:52
I've been like digging up old blogs
12:54
on it, but it just seems to
12:56
have dropped off. there was a lot
12:59
of hype I feel like it was
13:01
probably two years ago but I mean
13:03
the last four years have blurred together
13:06
so it could be anywhere between two
13:08
to six years about a metrics layer
13:10
right and it's I feel like I've
13:13
done all this like had to do
13:15
all this like mental processing around like
13:17
how does the metrics layer or semantics
13:19
layer differ from like a star schema
13:22
data warehouse to like have a reliable
13:24
data set, but it doesn't seem like
13:26
anyone is talking about that right now.
13:29
And I'm curious to hear your perspective.
13:31
Wow, that's a really good question. You
13:33
know, I think there's, you know, I'm
13:36
curious for your opinions, but I think
13:38
sort of going back to like, you
13:40
know, sort of the Taylor Swift kind
13:42
of analogy from before, there is this
13:45
like, like, I think there's this desire
13:47
to chase a shiny object right now.
13:49
And going back to this survey, like
13:52
if you're not talking about Genovia, you're
13:54
going to be left behind. And I
13:56
think there's a lot that goes into
13:59
delay. delivering narrative AI right now. We
14:01
can talk about what those things are.
14:03
And I'll go back to your remote
14:05
question for a second as well. But
14:08
I think if you're not on track or
14:10
have a really strong solid answer to how
14:12
you're on track, you're kind of on the
14:15
hot seat right now as a data leader.
14:17
And so I think that has just sucked
14:19
the air out of the room in every
14:21
single room where there is a data
14:23
leader or an executive leader. And I'll
14:25
explain what I meant by sort of data
14:27
as the mode. I think the, if you
14:29
think about like what a data need to
14:31
do now, basically like the first thing that's
14:34
being asked is like what models are
14:36
using, you know, what financial models are
14:38
using, like what LLLams are using,
14:41
etc., right? Like between like open
14:43
AI and thropic, etc. There's lots
14:45
of options. The thing is every single
14:48
data leader today has access to
14:50
the latest and greatest model. Everyone
14:52
has access to that. And so. I
14:54
have access to that. Michael, you have
14:57
everyone here has access to the models
14:59
that's like supported by 10,000 pHGs
15:01
and, you know, a billion GPUs, right?
15:03
And that is true for me and every
15:06
other company around me. So in that
15:08
world, how do I create something that's
15:10
valuable for my customers? How do
15:12
I create something that's unique?
15:14
Like what is what is the advantage?
15:17
Like I can create a product just
15:19
like you can create a product and
15:21
so what's a distinguishment here? Like
15:23
why, you know, if like, for example,
15:25
if I'm a bank, how can I
15:28
offer a differentiated service if I have
15:30
access to the exact same model as
15:32
you do and the exact same ingredients
15:34
of a generative AI product, if that
15:36
makes sense? And so I think what we're learning
15:39
is that in putting together these
15:41
generative AI applications, which are today
15:43
really limited to chat bots, if
15:45
you will, or sort of
15:47
agentic solutions, etc. And all
15:49
of those instances, the way
15:52
in which companies make those
15:54
products personalized or differentiated
15:56
is by marrying, by
15:59
introducing the their enterprise data, basically corporate
16:01
data. And so let's take a practical example. Like
16:03
let's say I'm a bank and I want to
16:05
build a financial advisor solution. I want to be
16:07
able to help Tim fill out his taxes. And
16:09
so I'm able to do that better if I
16:11
have data about Tim's background, his car, his house,
16:14
whatever it is. And so I can offer you
16:16
a much better differentiated product if I have reliable
16:18
data. about Tim that I can use. And so
16:20
that's the only difference between bank one and bank
16:22
two. It's what kind of data do we have
16:24
to power that product. Yeah, so just to summarize,
16:27
like we all have access to latest greatest models,
16:29
but the only thing that differentiates different generative AI
16:31
products is the data that's powering them. And so
16:33
that's why data is actually remote in the world
16:35
of generative AI. But I mean, I guess counterpoint,
16:37
like I feel like that is coming from a...
16:39
That's coming from a super data-centric perspective. I mean,
16:42
and I guess this is what this is what
16:44
terrifies me is that year 2025 could be supercharging
16:46
this obsession with more, more, more, more, more data
16:48
as you throw more data in, then it's harder
16:50
to keep it clean. You've got more things that
16:52
can conflict. And so, absolutely, and we fought this
16:55
battle in the past where there's, you chase all
16:57
this data because anytime something isn't. seen as valuable,
16:59
the easy thing to default to is to just
17:01
to point to some data that's not clean enough
17:03
or not clean. It may be clean enough, but
17:05
it's never going to be perfectly clean or data
17:08
that's missing. And so that can feed this like
17:10
horrendously vicious cycle where we completely lose sight of
17:12
like, what are we trying to do is get
17:14
as much data as possible. Like the counterpoint is
17:16
those banks could differentiate by... thinking about with way
17:18
less data what their customers really value what they
17:20
most need right and it's not it's not an
17:23
either or but if there is deep
17:25
understanding of their customer and
17:27
they value something, it
17:29
may need very little data.
17:31
It may be using
17:33
data in a different way
17:36
from the already have
17:38
it. So I think there
17:40
has to be that
17:42
balance. I would hope that
17:44
we get to that
17:46
point of like, we can't
17:48
just be in this
17:51
arms race for more and
17:53
more models, more data,
17:55
more, whatever. So, okay, Val,
17:57
unleash. Okay, so my
17:59
visceral reaction. My visceral reaction
18:01
is like, I can
18:04
absolutely see that some people
18:06
would use like what
18:08
you're saying, like the Gen
18:10
AI hype train to
18:12
be like, we need more
18:14
data. I don't think
18:16
that's what Boris is saying,
18:19
but I will obviously
18:21
give you the opportunity to
18:23
speak for yourself because
18:25
like my reaction is, but
18:27
it's not about the
18:29
quantity. It is about the
18:32
quality. Like it is
18:34
not about let's collect more
18:36
data. It's that we
18:38
have, the last few years
18:40
has been all about
18:42
like, let's have fucking data
18:44
lakes. Let's just dump
18:47
data from back end services
18:49
into anywhere and it's
18:51
created, I mean, I think
18:53
we've said a swamp
18:55
before, but it's like, you
18:57
can't ask important questions
19:00
like what do my customers
19:02
value if the data
19:04
that's there is a complete
19:06
trash fire and I
19:08
don't think about quantity. There's
19:10
also this distinction of
19:13
like, it is so easy
19:15
to say, I found
19:17
an error in the data.
19:19
This field is missing
19:21
or this field is incorrect.
19:25
Fix it as opposed to,
19:27
you just said if your
19:30
data is a dumpster, a
19:32
trash fire, there is a
19:34
gradation of which, so put
19:36
aside the more, more, more
19:38
data and bring in the
19:40
pristine data. That point, it
19:42
is so easy to find
19:44
a problem in the data
19:46
and chase that and extrapolate
19:48
from that. So absolutely we
19:51
need proper governance, but you
19:53
can replace either more, more,
19:55
more data which they're absolutely,
19:57
you can Google for it
19:59
and find all sorts of
20:01
articles that say who's gonna
20:03
win or the ones who
20:05
collect all the data. you will find,
20:07
I completely grant you, the data has to be garbage
20:09
in garbage out. I mean, that is like a PAP,
20:11
that may become my next favorite thing to hate on
20:14
after, again, Godly trust all others must bring data. Like,
20:16
it's so easy to say, garbage in, it's like, well,
20:18
people are not pouring garbage in. Yes,
20:20
there are errors. Yes, there is
20:22
process breakdown. Yes, there needs
20:24
to be governance and observability, but
20:27
it is so easy to say.
20:29
that if we're not getting value out,
20:31
oh, it's a data quality issue, and
20:33
now you can get equally obsessed around
20:36
over chasing that. So, Mo, I feel
20:38
like you were putting, you were again
20:40
putting words in my mouth and like,
20:42
well, it's not bad at all. But.
20:44
No, no, no. I just, I think
20:47
sometimes that. like when we're discussing this
20:49
concept there are like extremes and it's
20:51
says the one who said dumpster fire like
20:53
it sometimes is interpreted as a binary
20:55
thing and it's not like I do
20:58
think there is a spectrum it just
21:00
often happens that you're at one end
21:02
of the spectrum and I'm at the
21:04
other end but let me just elaborate
21:07
what I mean by quality because I
21:09
again can see a situation where a
21:11
business goes we must have perfect data
21:13
and that's not what I'm saying I'm
21:16
saying the data has to be meaningful
21:18
so that you can create connections between
21:20
different data sources and that the way
21:22
they relate to each other
21:24
is consistent so that like
21:27
different areas of the business
21:29
are not like tripping over
21:32
themselves making mistakes because
21:34
it's like fundamentally
21:36
so unstructured and so
21:38
like to me it's about how all
21:41
those things connect together. It's not
21:43
just about like is this number.
21:45
to the 99th percent or whatever.
21:47
It's, it's, I don't know, I'm
21:49
gonna just shut up and let
21:51
Bartle because I feel like she
21:53
probably. No, I love this. I've
21:55
been, I love hearing all spots.
21:57
I'm, I'm, yeah, I love it. Well, okay, so
21:59
a couple. One, obviously I'm biased, right? Like
22:01
I have a very data-centric view, but
22:03
I will not for a minute pretend
22:05
that I have nothing but bias, right?
22:08
And I think my bias comes from
22:10
a place of like, yeah, I think
22:12
data is like the most interesting place
22:14
to be in in the past five,
22:16
ten years and in the next five,
22:18
ten. I think it's like the coolest
22:20
party that everyone wants to be a
22:22
part of. And like, they should. And
22:24
you know, I'll continue thinking that, you
22:26
know, I have strong, I, you know,
22:28
wake up every day and choose to
22:30
be part of the data party. And
22:32
I think it's where we're having fun.
22:35
So yes, I'm a 100. I'm a
22:37
100% biased. I agree with you, I
22:39
think data hoarding has been a huge
22:41
issue, a huge problem, and I think
22:43
it's been sort of a strategy that
22:45
has largely failed, like, oh, let's just
22:47
collect all the data and hope that
22:49
it solves, or, you know, think that
22:51
more data is more helpful. It's actually
22:53
interesting. I was just sitting down with
22:55
the founder of a data catalog company
22:57
a couple of days ago, and we
23:00
were talking about how 95 percent of
23:02
the problems that people 95% of the
23:04
questions that people have of data have
23:06
already been answered. And so their challenge
23:08
is just finding the answer and surfacing
23:10
it. There's very, very net new insights
23:12
being created, if that makes sense. And
23:14
so really their challenge is about how
23:16
do we help people or users. discover
23:18
the answer versus create a new answer,
23:20
which is actually mind-blowing if you think
23:22
about like what a small percentage of
23:24
like new incisor generated, like it sort
23:27
of made me a little bit sad
23:29
for like you know the human race
23:31
but also happy that maybe we can
23:33
solve this, but you know I think
23:35
that I digress here, but my point
23:37
is I think what you're, what the
23:39
point that you're making Tim and Mo
23:41
is an important point. I am definitely
23:43
not, I don't think that more data
23:45
is necessarily better. In fact, I think
23:47
there are a lot of areas where
23:49
like less is better and like more,
23:51
you know, precise answers are better. For
23:54
a minute I'm not advocating for that,
23:56
not at all. I think what I
23:58
am saying is most of the, you
24:00
know, if you look at like chat
24:02
GPT or kind of things that like
24:04
anyone has access to. data that everyone
24:06
has access to. Like we can all
24:08
sort of, you know, it's funny, you
24:10
know, people used to say let me
24:12
Google go back for you and I
24:14
was trying to think what's the new,
24:16
like let me perplexity that for you.
24:19
I don't know, it doesn't, doesn't like
24:21
roll it off the tongue just as
24:23
much. Yeah. Well let me ask Claude
24:25
would work, you know, so. Exactly, let
24:27
me ask what Claude says. But I
24:29
think everyone sort of has access. to
24:31
that. But if you have some data
24:33
about your users, right, let's take like,
24:35
I don't know, like a hotel chain
24:37
that's trying to create a personalized experience
24:39
for their users, like no one knows
24:41
as much as they do about, you
24:43
know, I don't know, the like, how
24:46
you like to travel, the kind of
24:48
food you like to eat, the kind
24:50
of, you know, ads that would speak
24:52
better to you. Not that I'm advocating
24:54
for like an ad-centric world, but my
24:56
point is like... The power today and
24:58
where I think the leverage lies in
25:00
is in having things that not everyone
25:02
has access to, the latest and greatest
25:04
LM, so that cannot be your mode
25:06
or your advantage. By no means means
25:08
that we have to have too much
25:11
data or a lot of data. I'm
25:13
not advocating for that, and I think
25:15
it's a very important clarification. I actually
25:17
will say that oftentimes... In the companies
25:19
at least that I work with, one
25:21
of the biggest challenges is that they
25:23
have so much data, they don't even
25:25
know where to get started. And so
25:27
a lot of the work is actually
25:29
saying, let's try to, you know, you
25:31
can think of like layers of important
25:33
data, tier one, two, two, two, three,
25:35
and think about like what's the core
25:38
data sets that we care about, making
25:40
sure that those are really pristine and
25:42
reliable. So oftentimes like actually starting small.
25:44
is the winning strategy. I find when
25:46
companies, you know, when we work at
25:48
the company, company is like, I want
25:50
to observe everything wall to wall. I
25:52
be like, whoa, whoa, hold on. Like,
25:54
you're going to, that's going to be
25:56
really hard. Like, tell me why are
25:58
you actually using all of that data?
26:00
And that strategy often fails. And so
26:02
I'd much rather start with, what's a
26:05
small use case that you
26:07
actually really are using the data
26:09
for, and that's really important
26:11
for users? Let's start with like
26:14
making sure that that's really
26:16
highly trusted and reliable. So I
26:18
agree with you is my point here, and
26:20
I think it's an important
26:22
clarification. Bo, are you gonna? No, I am
26:25
like waiting for the next like rant.
26:27
We can rant by the way. I'm
26:29
happy to rant about garbage in garbage
26:31
out. I think that is a great
26:33
rant. I'm happy to like, you know,
26:35
carry the torch on ranting against that,
26:37
Tim, if you'd like. I don't know
26:40
if you want to share why you
26:42
want to rant. I'm happy to share
26:44
my rant about it. Go for it.
26:46
So I'm curious, Tim, like, when
26:48
I said that stuff about
26:50
like connectivity, What's your views
26:52
on that? Because I
26:54
feel like you can
26:57
only answer important questions
26:59
if the data is, like, kind
27:01
of, I don't want to say
27:03
structured, but I'm thinking about
27:06
like bar's comment of, you
27:08
know, the competitive advantage that
27:11
you have is your data
27:13
set. Like, it's not the
27:16
models, right? So like, how...
27:18
how that all works together then to
27:20
me becomes the most important bit and
27:22
like I really like bars concept actually
27:24
someone in my team did this recently
27:26
the where they went through of like
27:29
what's tier one tier two tier three
27:31
and like I think it's such a
27:33
great framework to help the business understand
27:35
like the different levels of importance but
27:37
like Tim what's your thoughts on like
27:39
that connectivity piece? So one I mean there
27:42
is There is nuance. I try to not
27:44
say things like it all has to be
27:46
connected or it's a dumpster fire or it's
27:48
perfectly pristine. And maybe I fell into it
27:50
a little bit and we chased the more
27:52
and the more and the more. But I
27:54
mean, I would love for there to be
27:57
a little bit more disciplined and nuanced like
27:59
like is. is bar when you
28:01
said like like starting small like
28:03
that is there is no pressure
28:05
no force in business right now
28:08
that says when doing anything with
28:10
your data you should should go
28:12
lock yourself in a room with
28:14
some smart people on a whiteboard
28:17
and then come out with a
28:19
mandate that it's an absolute minimalist
28:21
approach and then you build from
28:23
there because when you say something
28:26
what where And I feel like
28:28
I see this and I see
28:30
it, I mean, I'm spending too
28:32
much time on LinkedIn and reading
28:35
articles that if someone says, this
28:37
is data that we uniquely have
28:39
as a bank or a hotel
28:41
chain, therefore they make the leap
28:44
to we have it, therefore we
28:46
need to feed it in and
28:48
connect it because that is something
28:50
unique to us and therefore it
28:53
provides competitive advantage. And that there's
28:55
kind of a... that's the default
28:57
position is it's our unique data
28:59
we must use it and what
29:02
where I see that going wrong
29:04
is there's a missed step to
29:06
say like really like just because
29:08
we have it uniquely doesn't mean
29:11
it's necessarily valuable if somebody says
29:13
here's why we think it can
29:15
be valuable what's our what's our
29:17
minimum viable product what's our minimum
29:19
way to test that it would
29:22
be valuable But instead it kind
29:24
of is like, there has this
29:26
tendency to say, it's ours, put
29:28
it in the system, make sure
29:31
it goes through that it's pristine,
29:33
which when you flip it around
29:35
to LOM's, like, they're doing stuff
29:37
probabilistically, like hallucinations are coming out,
29:40
all of that's getting better, but
29:42
it's like, even with pristine data
29:44
going in, it's going to give
29:46
kind of inconsistent results. And we're
29:49
kind of like, oh, that's cool.
29:51
Well, it's like, well, then. I
29:53
can't remember who wrote, it might
29:55
have been Ethan Mollick or somebody
29:58
who pointed out, like, yeah. like
30:00
data that's got noise in it,
30:02
putting into something. It's not
30:04
that if you put pristine
30:06
data in, you're gonna get
30:08
a definitive, deterministic answer out.
30:10
If you put pristine data
30:12
in, you're gonna get a
30:14
probabilistic answer out. If you put
30:17
noisy data in, you're gonna
30:19
get probabilistic with a bigger
30:21
range of uncertainty. And I just,
30:23
I think there's just thought and
30:26
nuance to say if you had a
30:28
bias towards. Less and it's not saying
30:30
don't do it. It's just saying move
30:32
with deliberation so that like
30:34
you figure out something is a tier
30:36
one and then you say that's tier
30:38
one. It's a differentiator Lock that in
30:40
and make sure that it is clean
30:42
and when you're connecting it to something
30:44
else. You know, so that's Well, I
30:47
guess that was I was like I'm
30:49
not gonna run about this. I'm gonna
30:51
have a very nuanced thing to say
30:53
and then whoop here it comes
30:55
That was very eloquent. No, that
30:57
was eloquent. But okay, can I
30:59
add some color to the situation,
31:01
right? Like I feel like there
31:04
are some companies that still have
31:06
like a highly centralized model for
31:08
how they store their data or
31:10
how it's built, that sort of
31:12
stuff. Like my world is very
31:15
different to that. Everything's done completely
31:17
decentralized. So like in marketing we
31:19
have marketing analytics engineers and data
31:21
scientists creating data sets and then
31:23
over in the growth team there are
31:25
people creating data sets and over in
31:28
teams in education and like Even if
31:30
you start with that, like, let's
31:32
do something small, it's often created
31:35
in isolation. And the problem is,
31:37
is like, it's really hard to
31:39
answer a cross-cutting business question, like,
31:42
what's important to our customers or
31:44
what to our customers value, when
31:46
everything is built in this like
31:49
completely decentralized model, because like... If
31:51
I take my Tier 1 tables
31:53
and like data sets, that will
31:56
be completely different to another department's
31:58
Tier 1 data sets. like you
32:00
might not be able to answer that
32:02
question. I agree, like just to be
32:05
clear, I totally agree, I love this
32:07
idea of like starting with less, but
32:09
you can only start with less if
32:12
it is, I don't know if the
32:14
right word is like company wide or
32:16
like it's centralized. Like I feel like
32:18
there's this tension in how technology is
32:21
built in some companies. Can I quickly,
32:23
I'm going to admit this is unfairly
32:25
picking on an example that you just
32:28
through that if it's like what do
32:30
our customers value? And it's like, well,
32:32
I have to have all the data
32:34
and hook it all together, or I
32:37
could feel to study and ask them.
32:39
You know, like, there is that, there's
32:41
this story out there of, I'm going
32:44
to plug in, I'm going to launch
32:46
my internet and I'm going to say,
32:48
what are our customers value the most?
32:50
And then through all of this magic,
32:53
it's going to generate it. And you
32:55
say, well, why can it has to
32:57
connect all of this stuff? If that's
33:00
a fundamental question, then there are alternative.
33:02
techniques that have been around for 50
33:04
years, which is usability testing or focus
33:06
groups or panels for some of that.
33:09
That's unfair because you just yank that
33:11
out as one example. So I'm going
33:13
to acknowledge fair point. But yes, I
33:16
agree that there are other research methods
33:18
that would be more appropriate there. Again,
33:20
I'm going to shut up and let
33:23
Bart speak. No, not at all. I
33:25
love this. I feel like I'm asking
33:27
questions that I haven't thought of in
33:29
a while, so that's good. No, I
33:32
mean, listen to this, my reaction is
33:34
a couple of things. One is, you
33:36
know, going back to sort of date
33:39
as being faced with sort of a
33:41
really tricky part of their journey, I
33:43
think. And you talked a little bit
33:45
about sort of what does a great
33:48
model look like for a team? Like
33:50
is it sort of centralized or decentralized?
33:52
And think organizations like go back and
33:55
forth on that. And it also is
33:57
a little bit of like a function
33:59
of the environment in which they operate.
34:01
We work with highly regulated companies who
34:04
operate in a highly regulated environment. So
34:06
think like financial services or health care
34:08
or anything like that. And in those
34:11
instances, they are actually privy to significant
34:13
regulations and audits. And in those instances,
34:15
you really need to have really strong
34:17
data management and data quality controls in
34:20
place. And oftentimes I need to be
34:22
across your entire data estate. And that
34:24
is sort of like a table stakes.
34:27
You can't really operate without that. that
34:29
I think that's very different from you
34:31
know like a retailer organization or retail
34:33
company or you know an e-commerce company
34:36
so you know first and foremost I
34:38
think this is really dependent on where
34:40
what the environment you're operating and also
34:43
what problem are you trying to solve
34:45
when you know when we say data
34:47
products or generative AI applications it's very
34:49
broad and I think if you really
34:52
think about what actually is being used
34:54
there's a couple of things one is
34:56
like creating you know a personalized experience
34:59
for your customers, but it can also
35:01
be inwardly looking for a company sort
35:03
of automating internal operation. So an example
35:05
of Fortune 500 company that we work
35:08
with, they have a goal to have
35:10
their IT organization, 50% of their IT
35:12
work needs to be either completely AI
35:15
automated or AI assisted. That's sort of
35:17
their goal. And that's in terms of
35:19
internally. automating sort of human manual tasks.
35:21
And so, you know, I think it
35:24
sort of depends on what you're trying
35:26
to solve. And I think that's sort
35:28
of what data leaders need to ask
35:31
themselves today. Maybe sort of one thing
35:33
that's coming out of that is I
35:35
think there's this sort of blurring line
35:37
between different people working with data. So,
35:40
you know, in the past, there's sort
35:42
of, you know, you could really draw
35:44
the lines, I think more clearly between
35:47
engineers, data engineers, analysts, data scientists, all
35:49
of that is becoming a lot harder
35:51
to distinguish and I think my view
35:53
is sort of in you know the
35:56
teams that will be building generative applications
35:58
will be a mix of that. So
36:00
it will include both engineering and data
36:03
people. I don't think, I think, you
36:05
know, how does this work? Like someone
36:07
wakes up with a data company and
36:09
is like, hey, CTO, go build a
36:11
generative application. And so like a bunch
36:13
of engineers like run off and build
36:16
something. And then someone's like, hey, CDO,
36:18
go build a generative application. And so
36:20
like a bunch of engineers like run
36:22
off and like build stuff. And so
36:24
you end up having data teams trying
36:26
to build stuff. But at the end
36:29
of the day, like a strong journey
36:31
of AI application or any data product
36:33
needs a good UI, which should be built by
36:35
software engineers. Like, you're not
36:37
gonna, like, that's not the data
36:39
team's job. And it also needs, like,
36:41
good data pipelines and reliable pipelines. And
36:44
that doesn't make sense. Like, you don't
36:46
need, you know, a front end engineer to
36:48
build, like, a data pipeline. And so I
36:50
think at the end, there will be some
36:52
convergence of, like, What the rules are
36:54
but right now there's a there's
36:57
a lot of people sort of
36:59
crossing lines and lots of real
37:01
lines in between and what's your
37:03
perspective on? Data products being more
37:05
as like a platform product like
37:07
versus I don't know I feel
37:10
like there's been There are many
37:12
kind of ways you could cut
37:14
it, right? Like sometimes data products
37:16
seem to sit more in like
37:18
a marketing technology space or whatever,
37:20
but like it seems at the
37:23
moment there is kind of a lot
37:25
of perspective about it really sitting in
37:27
like that product platform sphere and like
37:29
product pms are quite different as well
37:32
to like a customer facing product manager.
37:34
Yeah, I mean, I think if you
37:36
look at like the product, oh, go
37:38
for it Tim. Well, I just want
37:40
to clarify. So when you say a
37:42
platform, are you saying the data product
37:45
is a platform that then gets kind
37:47
of, winds up serving a bunch
37:49
of different use cases? Are you
37:51
saying just where, are you saying
37:53
organizationally? Are you saying what the data
37:55
product is a platform with a bunch of
37:57
features? Like what do you mean by? Yeah.
38:00
say platform product I'm more meaning like
38:02
the products that you build suppose in-house
38:04
that serve as like the platform for
38:06
internal stakeholders and like the tools that
38:08
you're building to service your organization and
38:10
I suppose like as I'm saying this
38:12
out loud I'm like I suppose you
38:15
could have data products that would be
38:17
doing that you could also have customer
38:19
facing data products and those things would
38:21
probably be different oh wow I really
38:23
answered my own question there haven't I?
38:25
No, it's okay. I can elaborate, but
38:28
I think you did. You did answer
38:30
parts of it. So maybe also just
38:32
like to get a step back for
38:34
a second, if you think about data
38:36
products and where they are in the
38:38
hype cycle, like I think this sort
38:40
of like, you know, it's like there's
38:43
this hype and then they plateau and
38:45
then you're like, oh, now I can
38:47
actually make this product. There's like, oh,
38:49
now I can actually really use this
38:51
thing, which is good, I think, I
38:53
think data products can really mean. whatever
38:56
you want. It can both be, it
38:58
could be, you know, let's walk through
39:00
a simple example like an internal dashboard
39:02
that like, you know, the chief marketing
39:04
officer is using every day, right? And
39:06
so it's basically like a set of
39:08
dashboard or a set of reports, and
39:11
then there's a lot of like tables
39:13
with this, you know, followed by a
39:15
particular lineage that feed into that report.
39:17
And so it could be a combination
39:19
of you know, user attributes and sort
39:21
of different information about those users and
39:24
also some user behavior and could be
39:26
a bunch of sort of, you know,
39:28
different third party data sources. And so
39:30
all of that can be part of
39:32
a data product. So from, and you
39:34
can describe that as basically like all
39:36
the assets that are contributing to said
39:39
reporter dashboard that the CMO is looking
39:41
at. My point is. You can basically
39:43
use data products as a way to
39:45
organize your data assets and to also
39:47
organize your users and data teams. And
39:49
so to me, it's less of a
39:52
question of, you know, is this part
39:54
of a platform or not? Because that
39:56
varies, as I mentioned by the organization,
39:58
the size of maturity of the organization.
40:00
For me, it's more a way for
40:02
companies to organize what they care about.
40:04
And so oftentimes, you know, if we
40:07
will work with a data platform team,
40:09
we'll say, hey, like, the data that
40:11
you care about. And then they might
40:13
tell us, oh, you know, we have
40:15
a marketing team. And, you know, that
40:17
really focuses on, you know, our ads
40:20
business. And the CMO there looks at
40:22
this dashboard every morning and they are
40:24
so sensitive to any changes that they
40:26
have there. And so we want to
40:28
make sure that all the data pipelines
40:30
from ingestion, third-party data sources, through transformation,
40:32
through to that report. we want that
40:35
to be very high quality and accurate.
40:37
So we want to make sure that
40:39
that entire data product is trusted. That's
40:41
like one way to think about it.
40:43
Now the ownership of those assets can
40:45
be by the data platform itself or
40:48
it can be by the data analysts
40:50
that are actually running the reports. Oftentimes
40:52
it's a combination of both. So you
40:54
might have data analysts looking at the
40:56
reports, the data platform running the pipelines,
40:58
the toy separate engineering team that's owning
41:00
the data upstream and sort of the
41:03
different sources. And so oftentimes it's actually
41:05
all of them are contributing to sort
41:07
of a set data product, if you
41:09
will. But to me, where data products
41:11
are most useful is in a way
41:13
to organize data assets and organize a
41:16
view of the world for a particular
41:18
domain, for a particular business outcome. if
41:20
that makes sense. Do the data product,
41:22
this is I guess for both of
41:24
you, data product, product managers, like what's
41:26
the breadth, do they go, do they
41:28
engage all the way up to the
41:31
upstream engineering owning the data creation all
41:33
the way through to the to the
41:35
use case and the need or does
41:37
it, like where do, is there a
41:39
natural cutoff where they say? This is
41:41
now this is engineering's problem. They're just
41:44
they need to be managing the data
41:46
coming in or like how how broad
41:48
does that role go assuming it I
41:50
guess maybe there's a precursor question. Does
41:52
that role get defined and exist as
41:54
you are a data product product product
41:56
manager for this data product or set
41:59
of data products? And if so,
42:01
what's the scope of that role? Yeah, doesn't
42:03
it depend on the organization? Like,
42:05
I mean, we're having lots of
42:07
conversations at the moment, because like
42:09
I said, we have a decentralized
42:11
model, which is quite unique, right?
42:13
Because like, well, it's not unique,
42:15
but like, it creates different layers
42:17
of accountability, right? Because like, if you
42:20
have engineers that have a back-in
42:22
service and they're pushing that data
42:24
to you and then you're building
42:26
a data product off it, like...
42:28
The question that comes to mind
42:30
for me is like who's accountable?
42:32
Well, like, it's not an easy answer in that
42:34
model. I think it's a responsibility
42:36
of the teen that are in the
42:38
back-end service to make sure that the
42:41
data is getting pushed correctly. out, but
42:43
then likewise for the people who are
42:45
receiving it, like they have layers of
42:47
accountability as well as the people who
42:49
are using that data, but like in
42:52
a completely different model where you don't
42:54
have that, like you have a more
42:56
centralized model, those lines of ownership could
42:58
be different, right? And so I think
43:00
it's so dependent on the on the
43:03
company and how they're structured to understand
43:05
where something starts and ends.
43:07
I think it's probably... impossible
43:10
to think that a data
43:12
product PM would own everything
43:14
completely end to end. Like,
43:16
I can't envisage a world
43:19
where that would happen just
43:21
because there are so many
43:23
different parts of the bit.
43:25
Like, I don't know. Anyway, I'm
43:27
not making a lot of sense
43:30
now. Yeah, yeah. I mean, this
43:32
is a maybe, you know, not
43:34
what you'd want to hear, but
43:36
I think it depends answer. Like
43:38
it depends on the maturity of,
43:40
I mean I don't want to repeat
43:42
what Mo said, but I strongly agree
43:44
with that. It's hard to draw the
43:47
lines. I think some of the teams
43:49
that do this better are those
43:51
that are able to have like
43:53
a strong data governance team that
43:55
can actually sort of clearly sort
43:57
of lay out what that looks
43:59
like. You know, the most common model
44:01
is something like a federated model where
44:04
you have a centralized data platform, like
44:06
what you said, Mo. The centralized data
44:08
platform sort of defines what excellence looks
44:10
like, what great looks like. And so
44:13
they might define like, these are the
44:15
standards for security, quality, reliability, and scalability.
44:17
And so whenever you're building a new
44:19
data pipeline or adding a new data
44:22
source, you need to make sure that
44:24
it passes these requirements on each of
44:26
those elements. And so in that way,
44:28
like the centralized data platform defines what
44:31
great looks like. And then no matter
44:33
what team you're on, this could be
44:35
the data team serving the marketing team
44:37
or finance team or sort of whatever
44:40
use case it is. We adhere to
44:42
the same requirements that the centralized team
44:44
has defined. So we see a lot
44:46
of that. I think that's, again, with
44:49
generative AI, we will see more of
44:51
that because maybe going back to sort
44:53
of what we said at the very,
44:55
very beginning of the call, how we
44:58
use data 10 years ago was a
45:00
lot simpler. There were very few use
45:02
cases and very few people using data.
45:04
the need for a centralized, you know,
45:07
sort of governance definition is more important.
45:09
I mean, this is also, you know,
45:11
you kind of see this, I think
45:13
the sort of, you know, LLLM or
45:16
generative AI stack is still being defined,
45:18
but, you know, one of the questions
45:20
you raise this, Tim, was, you know,
45:22
allucinations are very real, right? And, you
45:25
know, when you release a product and
45:27
the data is wrong, you know, you
45:29
know, colossal... impact both on your revenue
45:31
and your brand. You know, maybe the
45:34
example that I like to give them
45:36
the most is, I don't know if
45:38
you all saw this, sort of went
45:40
viral on Twitter or X. I'm not
45:43
going to get used to that thing,
45:45
but it went viral on X. You
45:47
know, someone did this thing on Google,
45:49
like basically the prompt was something like,
45:52
what should I do if my cheese
45:54
is slipping off my pizza? was like,
45:56
oh we should just use organic superglue.
45:58
And, you know, the... Oh wow! It's
46:01
obviously a bad answer, right? And honestly,
46:03
I think Google can get away with
46:05
it because of such strong brand that
46:07
Google has these days. And so, yeah,
46:10
I'll probably continue to use Google even
46:12
though they gave me a shit answer
46:14
about like organic super glue from my
46:16
pizza. But most brands, if I'm, you
46:19
know, an esteemed bank or an airline
46:21
or a media company, I can't afford
46:23
to have... those kind of answers in
46:25
front of my users. And so like
46:28
actually getting that in order is, you
46:30
know, again, Google can get away with
46:32
it, but like 99% of us cannot.
46:34
Nice. I want to switch gears just
46:37
a little bit and talk about something
46:39
else that kind of obviously ties in,
46:41
but also kind of reintroduces a lot
46:43
of challenges, which is unstructured data. And
46:46
going into next year, one of the
46:48
articles I was reading that you'd written
46:50
bar was kind of like saying low
46:52
is going to be one of the
46:55
things could you kind of give a
46:57
perspective about okay so we're going to
46:59
be using a lot more unstructured data
47:01
but then doesn't that how do how
47:04
do we then take all the things
47:06
we've just been discussing about how challenging
47:08
data is and now we're just going
47:10
to slam on now a new set
47:13
of challenges on top of that they're
47:15
going to kind of re-do the whole
47:17
thing like what do what do people
47:19
do about this? We should do at
47:22
some point like a 2025 will be
47:24
the year of and see see see
47:26
what we come up with I don't
47:28
know if it'll be a little round-rovin.
47:31
Yeah, exactly. You asked Claude, I'll ask
47:33
perplexity, U.S. chat GPT, please. Yeah, exactly.
47:35
Exactly. I mean honestly if like if
47:37
we could foresee that we probably wouldn't
47:40
be in this business right we'd be
47:42
doing something else if we could be
47:44
forecasting that but I think as will
47:46
2025 be the year of unstructured data
47:49
I don't know but I can tell
47:51
you this for the last 10-15 years
47:53
most of the data work has been
47:55
done with structured data and structured data
47:58
is very easy. It's like, you know,
48:00
data that's like in rows, columns, tables
48:02
that you can analyze in a pretty
48:04
straightforward way with a schema and most
48:07
of like the modern data stack and
48:09
whatever solutions that we all use in love
48:11
on day-to-day has been focused on structured data.
48:13
That being said, if you look at where
48:16
the growth is, I think there's like, you
48:18
know, some crazy estimates from Gardner. you know,
48:20
like 90% of the growth in data will
48:23
come from unstructured data, something
48:25
like that, or, you know, and
48:27
just to define when, you know,
48:29
when we talk about unstructured data,
48:32
things like text, images, etc.
48:34
Well, 80% of that unstructured
48:36
data will be generated by an LLLM, so,
48:39
no, I'm... You know, it's turtles all the
48:41
ways, like, you know what I mean. you
48:43
know, I think the former founder of open
48:45
AI, it's something like we're at the peak
48:48
data of AI now, right? Like we're at
48:50
the time, we're like, this is the most
48:52
data that we have to train, and
48:54
from now on, we're going to have
48:56
to rely on synthetic data in order
48:58
to do that. So, you know, and
49:00
that goes back to your question of
49:03
like hoarding data. But going
49:05
back to the unstructured point,
49:07
I think, you know, unstructured data
49:09
is becoming more and more
49:11
and more and more important.
49:13
how to do with it. You know, I think this is
49:15
very early days for this space and I think
49:17
we're still sort of watching and kind of understanding
49:20
what's happening. But I think one of the
49:22
things just to make this really concrete with
49:24
an example, I think is a cool example.
49:26
You know, we work with a company that's
49:29
a Fortune 500 insurance company and one
49:31
of the most important types of data
49:33
for them, unstructured data, is actually
49:35
customer service conversations conversations.
49:38
So like, let's say, you know, I have a
49:40
policy or something that I'm upset with and I
49:42
want to chat with someone and then have this
49:44
conversation and you know you can analyze that
49:46
conversation to understand my sentiment to you know
49:48
how pissed off am I like am I
49:50
like yelling representative rep like I don't know
49:52
I'm like getting my manager or whatever it
49:54
is or you know I'm like super happy
49:56
thank you so much right like that's what
49:58
I mean by saying sentiment. So you
50:00
can sort of analyze like what is
50:03
a conversation like and and basically you
50:05
know you can also ask the user
50:07
for feedback right like sort of scoring
50:10
that. One of the things that this
50:12
customer does actually uses LLM to create
50:14
structure for this unstructured data. What do
50:16
I mean by that? They basically take
50:19
a conversation and then score that conversation.
50:21
So like zero to ten, this conversation
50:23
was a seven or an eight or
50:26
something like that. Now what's the problem?
50:28
The problem is that sometimes OLM hallucinate
50:30
and they might give a score that's,
50:32
let's say, larger than 10. What does
50:35
that mean if a score, if a
50:37
conversation score at a 12, for example,
50:39
right? So actually, like, the way in
50:42
which we were working with this company
50:44
is allowing them to observe the output
50:46
of the LLM to make sure that
50:49
the structure data is within the bound
50:51
of what a human would expect to
50:53
score an unstructured data, which is the
50:55
customer conversation. And so in that instance,
50:58
we're sort of using automation a way
51:00
that we maybe hadn't expected before in
51:02
order to add value and to sort
51:05
of, you know, in this instance, is
51:07
actually like reduce the cost and improve
51:09
the experience for the users in this
51:11
case. But it's one of those, that
51:14
brings up the case of say that
51:16
it just, that scoring that model, it
51:18
just, it shits the bed 10% of
51:21
the time, but it does way better.
51:23
60% of the time and it does
51:25
about the same as a human and
51:27
its overall A little bit cheaper like
51:30
I think that there are there are
51:32
the the tradeoffs and I mean, maybe
51:34
this goes back to earlier The discussion
51:37
that if it's like well, we're gonna
51:39
pull out the one that it said
51:41
at 12 and say You got to
51:43
fix that from happening. That's one approach
51:46
make this never happen the other option
51:48
is It's going to happen. So the
51:50
process needs to be human in the
51:53
loop or human on the loop. Like
51:55
don't don't completely hand this over so
51:57
that you can catch the ones because
51:59
a human would catch it and they're
52:02
the tradeoffs are, and you know what,
52:04
maybe they're even, you know, it's okay.
52:06
You're gonna have a small percentage who
52:09
are totally pissed off, even if you're
52:11
just running humans, because their wait time
52:13
is too long or something else. Is
52:15
your goal to have every customer have
52:18
a delightful experience? It may be a
52:20
different set of customers that are having
52:22
a horrible experience and then probably mode
52:25
if you're connected. You want to make
52:27
sure the ones with the highest predicted
52:29
lifetime value. You're not saying, great, we
52:31
have way fewer customers are pissed off.
52:34
Unfortunately, it tends to skew towards the
52:36
ones that are the highest, you know,
52:38
lifetime value. So, I think that's, yeah,
52:41
I mean, I think that's spot on.
52:43
And I think it's, I mean, one
52:45
of the questions that I remember sort
52:47
of thinking through thinking through thinking through
52:50
thinking through is like, like, like, like,
52:52
like no answer, like no answer, like
52:54
no answer, like, like, like, like, like,
52:57
or a bad answer. You know, and
52:59
I'm not sure. I can tell you,
53:01
we're not creating, you know, sort of
53:03
agents, if you will, in order to
53:06
say, oh, I don't know, right? That's
53:08
not how you create them. But oftentimes,
53:10
like, that actually might be the better
53:13
answer. I think Tamash, to include, you
53:15
know, sort of collaborated with on, you
53:17
know, predictions for next year. So to
53:19
us, like, you know, what you'd expect
53:22
is like 75 to 90% accuracy is
53:24
considered like state of the art for
53:26
AI. However, what's often not considered, I
53:29
mean, on the face of it, 75
53:31
to 90% seems, you know, really legit
53:33
and reasonable, but what's not, what's not
53:35
considered is like, if you have three
53:38
steps and each of 70 to 5
53:40
to 90% of accuracy, the combination of
53:42
that is actually ultimate accuracy of only
53:45
50%, which is, by the way, like,
53:47
worse than the high school student would
53:49
score in that sense. And so is
53:51
50% acceptable? Probably not. And so what
53:54
ends up happening. is actually what I
53:56
think we were seeing in Mark is,
53:58
is like, the market actually took this
54:01
big step back. Like I think a
54:03
year ago, there was this huge rush
54:05
to adopt a run of AI and
54:07
to try to build solutions. But as
54:10
we were seeing that the accuracy is
54:12
sort of, you know, at those ranges,
54:14
companies did take a step back and
54:17
actually are reevaluating or rethinking where to
54:19
place their bets or chips, if you
54:21
will, I still find that most companies.
54:23
evaluate a solution with a
54:26
human thumbs up or thumbs down like
54:28
was this answer good or not in
54:30
allowing users to just mark like yep
54:32
this was great or no this kind
54:35
of sucked companies still have that
54:37
and I don't think we're moving
54:39
away from that you know unless
54:41
there's sort of big big change
54:43
in in the near future. I
54:45
have a totally unrelated random question
54:47
bar with the companies you're
54:49
working with is the focus of
54:52
reliability and the work you do
54:54
quite different depending on whether data
54:56
is structured or unstructured like in
54:58
the use case you just gave
55:00
like it sounded like it was
55:02
quite different but like what are
55:04
you seeing across the industry? Yeah
55:07
100% like I think the use cases
55:09
that we cover very tremendously
55:11
based on industry and company
55:13
and I think that's a
55:15
reflection of the variability in what
55:18
you can do with the data across
55:20
the industry. So it can range, you
55:22
know, the types of products that
55:24
we work with can be, you
55:26
know, data products that are more
55:28
like a regulatory environment
55:30
where in, you know, one mistake in
55:33
the data could actually put you at
55:35
risk of regulatory fines. You know, if
55:37
you are using data in some incorrect
55:39
way, or not following what is defined
55:41
as sort of best practices for data
55:44
quality, sort of like this blanket statement
55:46
that's very high level, but actually like
55:48
is very important in these environments. That's
55:50
like one. The second can be where
55:52
you have a lot of internal data
55:55
products, so you know, like a lot
55:57
of reporting or you know, product organizations
55:59
that are... you know, doing analysis based
56:01
on cohorts or segmentation of your user
56:03
base, you know, a third could be
56:05
data products that are sort of customer
56:08
facing. So for example, if we have
56:10
like, you know, the easiest thing that
56:12
is like a Netflix, you know, recommends,
56:14
you know, your next best view, for
56:16
example, and then a third, I guess
56:18
a fifth use case could be, you
56:21
know, a generative data application. So for
56:23
example, like an agent chat bot that
56:25
helps you ask questions and answer about.
56:27
you know, your internal process or your
56:29
internal data. So you can ask really
56:32
basic questions like, you know, how many
56:34
customers do we have? And, you know,
56:36
how many customers have renews in the
56:38
last few years? Or if I'm in,
56:40
if I'm in support, I can ask
56:42
how many support tickets has this customer
56:45
submitted in the last year and in
56:47
what topics and, you know, what was
56:49
their C-SAT, sort of questions like that.
56:51
And so these. Each of these can
56:53
include structured or unstructured data, and each
56:56
of these can cover very, very different
56:58
use cases and very different applications of
57:00
the data. So if anything, I see
57:02
the sort of more less homogenous sort
57:04
of applications of the data, if that
57:06
makes sense. And I actually anticipate that
57:09
this will carry through to the generative
57:11
AI stack. So, you know, there's people
57:13
create software. In a multitude of different
57:15
ways, in a multitude of different stacks,
57:17
the same can be said for data.
57:20
There's not one single stack that rules
57:22
at all. There's not one single type
57:24
of data that rules at all in
57:26
order to create data. I think the
57:28
same will be true for generative AI.
57:30
There's not one single stack or one
57:33
single preferred language of choice. And there's
57:35
not one single preferred method, whether it's
57:37
structure data or unstruct data. I think
57:39
this does very much. sort of vary.
57:41
I will say from my biased point
57:43
of view is the thing that is
57:46
common sort of going back to like
57:48
the foundation of truth and sort of
57:50
what is very important is like every
57:52
organization needs to have or needs to
57:54
rely on their enterprise data. and make
57:57
sure that it's high quality trusted data
57:59
so that they can actually leverage and
58:01
capitalize on that. And I think it's
58:03
a messy, messy route to get there.
58:05
Maybe 2025 would be the year of
58:07
messiness. Sometimes you just gotta like lean
58:10
into the messiness, you know. You know,
58:12
on our like path, like this random,
58:14
you know, random path to kind of
58:16
figure it out. But there's a lot
58:18
more to figure it out there, but
58:21
I don't see us sort of converging
58:23
on like one single path or use
58:25
case or even type of data. All
58:27
right, we've got to start to wrap
58:29
up. This is so good. And yeah.
58:31
Oh, we figured it all out. So
58:34
we're good to wrap. We can before.
58:36
Yeah, exactly. 2025 will just be the
58:38
year of leaning into the mess. And
58:40
maybe that's the best we can do
58:42
right now. Anyway, one thing we love
58:44
to do is go around the horn,
58:47
share last call, something might be interesting
58:49
to our audience. Bar, you're our guest.
58:51
Do you have a last call you
58:53
want to share? Sure. So this concept
58:55
that someone has shared with me recently,
58:58
which they'll call sort of watching the
59:00
avocado, if you will. So I don't
59:02
know if you experience this, but you
59:04
know, you buy an avocado and it's
59:06
like, it's not ready, not ready, not
59:08
ready, boom, you're too late. It's already
59:11
like you can't eat it anymore, right?
59:13
That happens to you, right? And so,
59:15
you know, I think the idea is
59:17
like a lot of sort of new
59:19
technologies and trends are like that. And
59:22
in this case, sort of this is
59:24
like, generative AI. Like, we're too early,
59:26
we're too early, we're too early, boom.
59:28
You know, you miss the boat. And
59:30
so I think one of that, you
59:32
know, things that I take away from
59:35
that is like as data leaders, as
59:37
sort of data practitioners, how do we
59:39
keep watching the avocado? We've got to
59:41
hit the avocado before it's too ripe.
59:43
But the timing matters here, especially for
59:46
a lot of these sort of trends
59:48
and technologies. Nobody likes bad guacamole. The
59:50
business who now uses that when they're
59:52
talking somewhere internally, if they use the
59:54
analogy, please let us know. I want
59:56
to, I like that. We got to
59:59
watch the avocado. Yeah, it's awesome.
1:00:01
All right, Mo, what about you?
1:00:03
What's your last call? Okay, I've
1:00:05
been doing lots of thinking
1:00:07
about how I make 2025
1:00:10
really great. And I think
1:00:12
one of the tensions I've
1:00:14
found is that like I'm
1:00:16
naturally inclined to like want
1:00:18
to go fast and get
1:00:20
to the place that I want
1:00:22
to get to. And so this is
1:00:25
not anything other than just
1:00:27
Kind of a personal learning or
1:00:29
a personal goal that I've set
1:00:31
for myself. It is the start
1:00:33
of 2025 after all and that
1:00:36
I want to be more intentional
1:00:38
about enjoying the journey and The
1:00:40
analogy I have is I love
1:00:42
going to the beach going to
1:00:44
the beach with two small humans
1:00:46
is really fucking hard. There's all
1:00:48
this shit to pack. You've got
1:00:50
a carted old down there. Everyone
1:00:52
needs sunscreen on like And so sometimes
1:00:54
the bit of getting to the beach
1:00:56
is so unpleasant that by the time
1:00:58
you get there, you're all like flustered
1:01:01
and hot and you don't want to
1:01:03
be there and you're like, oh, fuck
1:01:05
it, let's all just go home. So
1:01:07
I'm trying to enjoy the journey to
1:01:09
get there more. So like, I went
1:01:11
to the beach the other day, it
1:01:13
took us an hour to get there.
1:01:15
My kids wanted to stop at this
1:01:17
playground, they wanted to look at the
1:01:19
bird, like, they wanted to have a snack.
1:01:22
lean into letting, enjoying the bit to
1:01:24
get there and not focusing so much
1:01:26
on kind of the end state. And
1:01:28
it's not just about kids, it's also
1:01:30
about work, right? Because like, if you're
1:01:33
constantly trying to like come up with
1:01:35
this huge amazing strategy and deliver this
1:01:37
project, but like you're miserable in the
1:01:39
months delivering it, that kind of, you
1:01:41
know, defeats the purpose. So anyway, that's
1:01:43
just my intention for the year that I
1:01:46
share. What about you, Tim? Well, my
1:01:48
publisher is gonna hurt me if I don't. Plug
1:01:50
analytics the right way. So if you're
1:01:52
depending on when you're listening to this,
1:01:54
it is less 15 or fewer days
1:01:56
from actually being available, but analytics the
1:01:58
right way is available. for pre-order until
1:02:01
January 22nd, in which case it
1:02:03
will be available as a printbook
1:02:05
or an e-book and the audio
1:02:07
books coming out four or five
1:02:09
weeks after that. So that does
1:02:11
have a section talking about human
1:02:13
in the loop versus on the
1:02:15
loop versus out of the loop
1:02:17
and some of the AI tradeoffs,
1:02:19
but it is not an AI
1:02:21
heavy book at all. So that's
1:02:23
my obligatory self, my log rolling
1:02:25
last call. For fun, I will,
1:02:28
I've definitely last called Stuff from
1:02:30
the Pudding before, but one that
1:02:32
they recently had, it's at Pudding.
1:02:34
Cool, but it was Alvin Chang,
1:02:36
got a data set that looked
1:02:38
at a whole bunch of different
1:02:40
roles. and it was how much
1:02:42
they spent of their time sitting
1:02:44
versus standing. So it's kind of
1:02:46
one of those like scrolling visualizations.
1:02:48
You enter kind of some stuff
1:02:50
about your job first, so it
1:02:53
can then kind of locate you
1:02:55
on it. But it's just a
1:02:57
simple x-axis from that goes from
1:02:59
sitting all the time for work
1:03:01
versus standing all the time for
1:03:03
work, and then it looks at
1:03:05
a whole bunch of different. It
1:03:07
varies what the y-axis is as
1:03:09
you scroll through it. So it's
1:03:11
kind of just a fun visualization.
1:03:13
how tough on bodies a lot
1:03:15
of our professions are because they're
1:03:17
required to crouch or stand all
1:03:20
the time. They can't take breaks
1:03:22
and that sort of thing. But
1:03:24
it's just kind of a fun
1:03:26
interactive visualization. So worth checking out
1:03:28
to Robax. What about you, Michael?
1:03:30
What's your last call? I mean,
1:03:32
it was gonna be the book.
1:03:34
Tim, I was. I was actually
1:03:36
ready to do one on the
1:03:38
book for you just in case
1:03:40
you didn't cover it so good
1:03:42
job. We'll report back to your
1:03:44
publisher you're doing it. You're doing
1:03:47
what you can do? No. So
1:03:49
actually mine is recently recast who
1:03:51
I think is some of the
1:03:53
best in the game when it
1:03:55
comes to Media Mix models. They've
1:03:57
started publishing a series of YouTube
1:03:59
videos on how to think through
1:04:01
the creation of those models and
1:04:03
I think it's a great watch
1:04:05
for anybody who's engaging with that
1:04:07
kind of data so I'd highly
1:04:09
recommend it and they've put a
1:04:11
couple out already and then I
1:04:14
think there's some more to come
1:04:16
so that would be my last
1:04:18
call all right so what is
1:04:20
2025 the year of I would
1:04:22
just have one word everybody has
1:04:24
to go around and do like
1:04:26
a one word it's or like
1:04:28
a fast No, no, nothing. Moderation.
1:04:30
I think, I think 2020, yeah,
1:04:32
there you go. I think 2025
1:04:34
is going to be the year
1:04:36
of being thoughtful, keeping with the
1:04:39
work, increasing insights, maybe helping with
1:04:41
process. None of that's actually going
1:04:43
to happen, but I just sort
1:04:45
of like wish it were. So
1:04:47
that's my take on it. So
1:04:49
you use the one word for
1:04:51
all of us. You just, you
1:04:53
kind of took. we all deferred
1:04:55
or well nobody answered Tim so
1:04:57
I just figured we were not
1:04:59
gonna I yielded my one word
1:05:01
to you so yeah I like
1:05:03
it so I couldn't think of
1:05:06
a better person to help us
1:05:08
kick off 2025 with then you
1:05:10
bar thank you so much for
1:05:12
coming on the show is been
1:05:14
awesome absolutely I hope 25 2025
1:05:16
will be you know even better
1:05:18
and greater than 2024 and you
1:05:20
know I would probably be remiss
1:05:22
if I wouldn't say that 25
1:05:24
would be the year of highly
1:05:26
reliable data and AI. That's right.
1:05:28
What's a saying from your mouth
1:05:30
to God's ears and whatever that's
1:05:33
though we absolutely would want that?
1:05:35
Amen. Thank you so much. Awesome.
1:05:37
Thank you so much for coming
1:05:39
on the show again. And of
1:05:41
course, no show would be complete
1:05:43
without a huge thank you to
1:05:45
Josh, Crowherst, our producer, just getting
1:05:47
everything done behind the scenes. As
1:05:49
you've been listening and thinking about
1:05:51
2025, we'd love to hear from
1:05:53
you. Feel free to reach out
1:05:55
to us. You can do that
1:05:58
via our LinkedIn page or on
1:06:00
the Measure Slack chat group or via
1:06:02
email at contact at analytics
1:06:04
hour.io. We'd love to hear your
1:06:06
thoughts. Other things that you think are
1:06:09
big topics for 2025 in the world
1:06:11
of data and analytics. So once again,
1:06:13
Barr, it's a pleasure. Thank you so
1:06:15
much for taking the time. We really
1:06:18
appreciate having you on the show again.
1:06:20
And you know, you're on track now.
1:06:22
We keep talking about the Five Timers
1:06:24
jacket. That's gonna be a thing. So
1:06:27
you're in the running. There's only been
1:06:29
a few people that done this a
1:06:31
couple of times. Are you prepared to
1:06:33
have five kids, I guess is the
1:06:36
question. Like, we may need to break.
1:06:38
Anyway, so of course, I think I
1:06:40
speak for both of my co-host, Tim
1:06:42
and Mo, when I say, no matter
1:06:44
where your data is going, no matter
1:06:47
the AI model you're using,
1:06:49
keep analyzing. Thanks for listening.
1:06:51
Let's keep the conversation going
1:06:53
with your comments, suggestions, and
1:06:55
questions on Twitter at at
1:06:57
Analytics Hour on the web
1:06:59
at Analytics Hour.io, our LinkedIn
1:07:01
group, and the Measured Chat
1:07:03
Slack group. Music for the
1:07:05
podcast by Josh Crowhurst. So
1:07:07
smart guys want to fit
1:07:09
in, so they made up
1:07:11
a term called Analytics. Analytics
1:07:14
don't work. Do the analytics
1:07:16
say go for it no matter
1:07:18
who's going for it? So if
1:07:20
you and I were on the
1:07:22
field the analytics say go
1:07:24
for it? It's the stupidest
1:07:27
laziest lamest thing I've ever
1:07:29
heard for reasoning in competition
1:07:31
So my yeah, my smart
1:07:34
speaker decided to weigh in
1:07:36
on that I love it. What
1:07:38
did they have to say about that? Yeah. It's
1:07:40
the perfect little end note to that
1:07:42
particular. Yeah. Yeah. And Tim, probably be
1:07:44
a few minutes for you. Yeah. Her
1:07:46
thumbs down that. And the background was
1:07:48
saying, nope, I don't think I can.
1:07:50
Actually, it basically said I don't know
1:07:52
now that I think about it. It
1:07:55
was like, whatever it decided it hurt,
1:07:57
which was nothing. Yeah. Perfect. Perfect. Rock
1:08:00
flag and and lean
1:08:02
into the the mess!
Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More