#262: 2025 Will Be the Year of... with Barr Moses

#262: 2025 Will Be the Year of... with Barr Moses

Released Tuesday, 7th January 2025
Good episode? Give it some love!
#262: 2025 Will Be the Year of... with Barr Moses

#262: 2025 Will Be the Year of... with Barr Moses

#262: 2025 Will Be the Year of... with Barr Moses

#262: 2025 Will Be the Year of... with Barr Moses

Tuesday, 7th January 2025
Good episode? Give it some love!
Rate Episode

Episode Transcript

Transcripts are displayed as originally observed. Some content, including advertisements may have changed.

Use Ctrl + F to search

0:00

Welcome to the Analytics

0:02

Power Hour Hour. Analytics

0:05

topics covered conversationally and

0:07

sometimes with explicit language. Hey

0:09

everybody, welcome. It's the Analytics

0:11

Power Hour Hour and this

0:14

is episode 262. Hey, happy

0:16

New Year. You know, 2025.

0:19

That'll probably be the year

0:21

of, well, what exactly? And

0:23

there is a pretty steady

0:26

flow of prognostications every year

0:28

about the things that will

0:30

define the coming year. And

0:33

we're not, you know, completely

0:35

immune to desire to define the

0:37

future. I didn't say that very

0:39

clearly, but we do want to

0:41

define the future. So what will

0:44

2025 bring? It's probably the year

0:46

of Tim Wilson still being frustrated

0:48

with people calling stuff the year

0:50

of... That's fair. Probably accurate, yeah.

0:53

You can try to end with

0:55

Tim being frustrated with people. You

0:57

don't really need to say, oh,

0:59

there you go. Further qualifiers on

1:01

it. Not necessary. We still like

1:04

you. And 2025, probably be the

1:06

year of Mo, still liking Adam

1:08

Grant and Bernay Brown. Hey, Mo.

1:10

Yeah, probably, actually. That's a

1:13

very good prediction. There's

1:15

going to be a huge scandal with

1:17

one of them between like recording and

1:19

that coming out. Oh, jeez. It's going

1:22

to be. All right. Well, and I'm

1:24

Michael Helbling. Well, some attempts at categorizing

1:26

the future that is coming at us

1:28

awfully fast is definitely warranted. So what

1:30

better time than the first episode of

1:33

2025? You know, insert Zagger and Evans

1:35

pun here. And to do this right,

1:37

we wanted to have a guest who

1:39

has a great track record of observing

1:41

our industry and seeing where the puck

1:44

is going. Barb Moses is the co-founder

1:46

and CEO of Monte Carlo, the data

1:48

reliability company. As part of her role

1:50

as CEO, she works closely with data

1:53

leaders at some of the foremost AI-driven

1:55

organizations like Pepsi, Roche, Fox, American Airlines,

1:57

hundreds more. She's a member of the

1:59

four Council and is

2:02

a returning guest of the show.

2:04

Welcome backbar. Thank you so much. I

2:06

am honored and pleased to be a

2:08

returning member. No, we're serious. We

2:11

love the way that you take

2:13

such an interest in really having

2:15

from your level a real good

2:17

clear view of where our industry

2:19

is in the data industry is

2:21

going. Before we get started, let's

2:24

just get a recap of what's

2:26

going on with you and Monte Carlo.

2:28

Yeah, it's been a world in a

2:30

couple of years for not only for

2:32

more any Carlo, but I'd say

2:34

for the entire data industry. Like

2:36

I'm just reflecting last time I

2:38

was here, this was 2021. It's

2:40

just kind of, you know, coming

2:42

out of COVID, I think we,

2:44

you know, we're all getting comfortable

2:46

behind the camera and feeling comfortable

2:48

at home and, you know, the

2:50

world is obviously very different today,

2:52

but maybe just kind of a

2:54

quick recap. You know, Monte Carlo

2:56

was founded to solve the problem

2:59

of what we call data downtime,

3:01

periods of time when data is

3:03

wrong or inaccurate. And, you know, five, ten

3:05

years ago, that actually didn't seem

3:07

important at all. Like, I think people

3:09

spent some time thinking about quality of

3:11

data, but you know, you guys know this better

3:13

than I do, but it probably didn't

3:16

get the diligence that it deserved back

3:18

then. Like, you could kind of like

3:20

skirt around the issue. You probably, you

3:22

know, it was very common at the

3:24

time to just have like... extra eyes

3:26

on the data to make sure that

3:28

a report is accurate. And if it

3:30

was wrong, you kind of me like,

3:32

oh, shocks, so sorry, and kind of

3:34

like move on. I also, but sorry

3:36

to interrupt, but I also think it

3:38

maybe wasn't as complex. And so like,

3:40

you know, as complexity has grown that

3:42

the ability to troubleshoot and dig

3:44

into the why it's not reliable

3:46

is even harder. But, sorry to

3:48

break your stride. Not at all. No,

3:50

I think that's spot on. And maybe just

3:53

to unpack that a I think it was

3:55

less complex because one, the use cases were

3:57

limited, right? So today we call a data

3:59

product. and very fancy names for, you

4:01

know, but the use case was maybe

4:04

just revenue reporting to the street, right?

4:06

And, you know, the, so these cases

4:08

were fewer, the timelines were fewer, so,

4:10

you know, you maybe use data like

4:13

once a quarter to report the numbers,

4:15

and also there were fewer people working

4:17

with data. So maybe it's like a

4:20

couple of analysts under the finance team,

4:22

and so you really had a lot

4:24

more time, less use cases, less complexity,

4:26

in which, and the stakes were lower.

4:29

Right? And so in all of those

4:31

instances, like, it kind of didn't really

4:33

matter if the data was accurate or

4:35

not. And then there was this big

4:38

wave of actually, like, people starting to

4:40

use data. Remember when people would say,

4:42

oh, were data driven? And you kind

4:44

of, like, didn't really believe them. That

4:47

whole, there was a period back in

4:49

time, you know? And still happening. Still

4:51

happening. Totally agree with you. So I

4:53

think there was this like, you know,

4:56

big push and that's sort of when

4:58

Monty Carlo created the category of data

5:00

observability, which is basically allowing people creating

5:02

data products, whether those are data engineers,

5:05

data analysts, data scientists, anyone working with

5:07

data to make sure that they are

5:09

actually using trusted reliable data for that.

5:11

And sort of, you know, kind of

5:14

like helping when someone's looking at the

5:16

data and like what WTO the data

5:18

here looks wrong. you know, helping those

5:20

people come answer the question of what's

5:23

wrong and why. That was sort of

5:25

kind of like the reason how Monica

5:27

was born. Now fast forward today, I

5:29

can't believe it's almost 2025, it's like

5:32

four years since. You know, I like

5:34

to say that I think the data

5:36

industry a little bit like Taylor Swift,

5:39

we kind of like reinvent ourselves every

5:41

year. We need to like an era's

5:43

tour and kind of like go through

5:45

all the, you know, periods of time

5:48

of the data industry. And I think

5:50

the most recent era being swept by

5:52

generative AI, the implication of that means

5:54

that bad data is even worse for

5:57

organizations. And we can unpack what that

5:59

means, but at a very high level,

6:01

what Monte Carlo does. help organizations, you

6:03

know, enterprises, make sure that the data

6:06

that they're using to power their pipelines,

6:08

power their dashboards, power their generative AI

6:10

applications, is actually trusted and reliable. And

6:12

we do that by first and

6:14

foremost knowing when there's something wrong,

6:17

right? Like knowing if the data

6:19

is late or inaccurate, but then also

6:21

being able to answer the question

6:23

of why is it wrong? And how do

6:25

actually resolve an issue? I'll sort of pause

6:28

there, sort of a long answer and

6:30

a lot more that we can go

6:32

into, but whoa, it's been a fun

6:34

couple of years. Well, but also,

6:37

I mean, one, I guess just

6:39

to clarify, we're not saying that

6:42

in 2021 people weren't using data.

6:44

I mean, that's been ramping up

6:46

for a while. I think also

6:49

the modern data stack, I'm not

6:51

sure where that phrase was in

6:53

the... inflated expectations versus like it

6:56

definitely I feel like since the

6:58

last time you were on the

7:00

modern data stack is a phrase

7:02

has slid into the trough of

7:04

disillusionment at least a little bit which

7:06

is kind of interesting I don't know

7:08

exactly how that applies to kind of

7:10

where we're going from here but I

7:12

feel like there was a point where

7:14

it was like if we just have all

7:16

these modules plugged in together with

7:18

the right layers on top of

7:20

them then like all will be

7:22

good and it feels like we're

7:24

we're a little past that that

7:27

that that nirvana even if we

7:29

got there wouldn't actually

7:31

necessarily yield the

7:34

the results that were being promised

7:36

but yeah I mean I think Look,

7:38

putting myself in sort of the shoes

7:41

of data leaders today, you're facing a

7:43

really tough reality because like every 12

7:45

to 18 months you're being thrown at

7:47

with sort of a new concept. Call

7:49

it modern data platform, call it generative,

7:51

call it whatever you want. You're sort

7:53

of expected to be on top of

7:55

your game and sort of understand the,

7:58

you know, word or trend du jour. I

8:00

think if you sort of unpeel that

8:02

for a second and go back to

8:04

fundamentals, there are a couple of things

8:06

that I think remain true regardless and

8:09

have remained true for the last 10,

8:11

15 years, which is first and foremost,

8:13

like organizations want to use data and

8:15

data as a competitive advantage. How you

8:18

use it and in what ways, like

8:20

I think that is undisputable. Like strong

8:22

companies have strong data practices and use

8:24

that. to their advantage. You can talk

8:26

about how, for example, you can use

8:29

it for better decision-making internally. That was

8:31

sort of one of the dominant use

8:33

cases in the beginning. You can use

8:35

it to build better data products. Like,

8:38

for example, you can have a better

8:40

pricing algorithm. And I think today, you

8:42

can talk more about this, but I

8:44

think data is the most. for generative

8:46

AI products and innovative solutions. And so

8:49

regardless of where the hype cycle is,

8:51

I think one core truth is that

8:53

data matters to organizations. Will we do

8:55

matters? And so data continues to be

8:58

a core part for organizations. I think

9:00

the second sort of fundamental truth that

9:02

we believe in is like reliable data

9:04

matters. Like the data is worthless if

9:06

you're working with. Yeah, you know, like

9:09

it's, this even goes without saying, but

9:11

like having something that you can trust

9:13

in is sort of fundamental to your

9:15

ability to deliver it. And then I

9:17

think the third thing that sort of

9:20

always arraign true is like innovation matters.

9:22

Like you have to be at the

9:24

forefront and so organizations that are doing

9:26

nothing about generative AI or doing nothing

9:29

to kind of, you know, learn what's

9:31

next will be at a difficult position.

9:33

I'm curious for your takes for your

9:35

takes. you know, benefits of that was

9:37

that data leaders were met with many

9:40

solutions for many problems, but actually were

9:42

inundated with perhaps too many solutions. And

9:44

so ended up in a position where

9:46

they had to make bets on a

9:49

variety of solutions and ended up with

9:51

maybe sort of a proliferation of tools.

9:53

And now there's a big movement to

9:55

actually consolidate that or cut back to

9:57

what's necessary. And so if you're not

10:00

solving a core... fundamental truth, then

10:02

you probably don't deserve to

10:04

live in the modern data

10:06

stack, if that makes sense.

10:08

You don't deserve to live

10:11

in the modern data stack.

10:13

I sorry. I so deeply

10:15

love when the podcast intersects

10:17

with things that are like

10:19

completely churning through my brain

10:21

at the moment. And it

10:23

is like this beautiful like

10:25

chef kiss because these are

10:27

all kind of concepts that

10:29

I've been giving a lot

10:32

of thought to over the

10:34

break. I want to dig

10:36

into what you mentioned data

10:38

can be a moat. Can you can

10:40

you say more about that especially

10:43

you said I think relative

10:45

to gen AI? Yeah for sure I'm happy

10:47

to. So I think you know what's

10:49

happened to Let's sort of

10:51

think about like the last, the last

10:54

I want to call a year or

10:56

two in generative AI. I'll actually

10:58

start by sharing a survey that

11:00

we did that I thought was

11:02

really, really funny. We basically interviewed

11:05

a couple hundred data leaders

11:07

and asked them what percentage

11:09

of data leaders are building

11:11

the generative AI. Can you guess

11:14

what percentage of data leaders?

11:16

Probably all of them are

11:18

saying that they are at least.

11:20

Really. Yeah, so like I think

11:22

like 97% like not a single

11:24

person. Yeah, that's you're just spot

11:27

on Michael Oh No, we're all

11:29

doing it for sure all doing it.

11:31

We're all doing it. We're all doing

11:33

it. Everyone 25 is the

11:35

year of maybe building with AI

11:38

maybe Maybe we're all doing it

11:40

right? How often do you

11:42

do a survey and get

11:44

almost 100% response rate right

11:46

like for a question? It's

11:48

pretty outlier Second question that we

11:50

asked was what percentage of you are like

11:53

do you feel confident in the data that

11:55

you have like do you trust the data

11:57

that you have that's running it? What do

11:59

you think? is what percentage of people

12:01

trust the data that they're using for

12:03

gen AI? 70%. That's not bad. It

12:06

was a 70? Okay. Because usually the

12:08

Duke business school used to do a

12:10

CMO survey every year and they would

12:13

ask data questions. like that and there

12:15

was usually about a 60% gap between

12:17

how important it is versus how much

12:20

they trusted it. It was always a

12:22

very big delta. So yeah. That's exactly

12:24

right. So 60% said they don't trust

12:26

it. So I think that's exactly the

12:29

delta. So only one out of three

12:31

trust and two out of three don't

12:33

trust the data. So it's interesting that

12:36

everyone is building generative AI, but no

12:38

one has the core component to actually

12:40

deliver a sad generative AI. I think

12:43

that speaks more to kind of human

12:45

nature, right? And what we want to

12:47

be, where we are. Can I ask,

12:49

this concept has been rolling around and

12:52

I've been like digging up old blogs

12:54

on it, but it just seems to

12:56

have dropped off. there was a lot

12:59

of hype I feel like it was

13:01

probably two years ago but I mean

13:03

the last four years have blurred together

13:06

so it could be anywhere between two

13:08

to six years about a metrics layer

13:10

right and it's I feel like I've

13:13

done all this like had to do

13:15

all this like mental processing around like

13:17

how does the metrics layer or semantics

13:19

layer differ from like a star schema

13:22

data warehouse to like have a reliable

13:24

data set, but it doesn't seem like

13:26

anyone is talking about that right now.

13:29

And I'm curious to hear your perspective.

13:31

Wow, that's a really good question. You

13:33

know, I think there's, you know, I'm

13:36

curious for your opinions, but I think

13:38

sort of going back to like, you

13:40

know, sort of the Taylor Swift kind

13:42

of analogy from before, there is this

13:45

like, like, I think there's this desire

13:47

to chase a shiny object right now.

13:49

And going back to this survey, like

13:52

if you're not talking about Genovia, you're

13:54

going to be left behind. And I

13:56

think there's a lot that goes into

13:59

delay. delivering narrative AI right now. We

14:01

can talk about what those things are.

14:03

And I'll go back to your remote

14:05

question for a second as well. But

14:08

I think if you're not on track or

14:10

have a really strong solid answer to how

14:12

you're on track, you're kind of on the

14:15

hot seat right now as a data leader.

14:17

And so I think that has just sucked

14:19

the air out of the room in every

14:21

single room where there is a data

14:23

leader or an executive leader. And I'll

14:25

explain what I meant by sort of data

14:27

as the mode. I think the, if you

14:29

think about like what a data need to

14:31

do now, basically like the first thing that's

14:34

being asked is like what models are

14:36

using, you know, what financial models are

14:38

using, like what LLLams are using,

14:41

etc., right? Like between like open

14:43

AI and thropic, etc. There's lots

14:45

of options. The thing is every single

14:48

data leader today has access to

14:50

the latest and greatest model. Everyone

14:52

has access to that. And so. I

14:54

have access to that. Michael, you have

14:57

everyone here has access to the models

14:59

that's like supported by 10,000 pHGs

15:01

and, you know, a billion GPUs, right?

15:03

And that is true for me and every

15:06

other company around me. So in that

15:08

world, how do I create something that's

15:10

valuable for my customers? How do

15:12

I create something that's unique?

15:14

Like what is what is the advantage?

15:17

Like I can create a product just

15:19

like you can create a product and

15:21

so what's a distinguishment here? Like

15:23

why, you know, if like, for example,

15:25

if I'm a bank, how can I

15:28

offer a differentiated service if I have

15:30

access to the exact same model as

15:32

you do and the exact same ingredients

15:34

of a generative AI product, if that

15:36

makes sense? And so I think what we're learning

15:39

is that in putting together these

15:41

generative AI applications, which are today

15:43

really limited to chat bots, if

15:45

you will, or sort of

15:47

agentic solutions, etc. And all

15:49

of those instances, the way

15:52

in which companies make those

15:54

products personalized or differentiated

15:56

is by marrying, by

15:59

introducing the their enterprise data, basically corporate

16:01

data. And so let's take a practical example. Like

16:03

let's say I'm a bank and I want to

16:05

build a financial advisor solution. I want to be

16:07

able to help Tim fill out his taxes. And

16:09

so I'm able to do that better if I

16:11

have data about Tim's background, his car, his house,

16:14

whatever it is. And so I can offer you

16:16

a much better differentiated product if I have reliable

16:18

data. about Tim that I can use. And so

16:20

that's the only difference between bank one and bank

16:22

two. It's what kind of data do we have

16:24

to power that product. Yeah, so just to summarize,

16:27

like we all have access to latest greatest models,

16:29

but the only thing that differentiates different generative AI

16:31

products is the data that's powering them. And so

16:33

that's why data is actually remote in the world

16:35

of generative AI. But I mean, I guess counterpoint,

16:37

like I feel like that is coming from a...

16:39

That's coming from a super data-centric perspective. I mean,

16:42

and I guess this is what this is what

16:44

terrifies me is that year 2025 could be supercharging

16:46

this obsession with more, more, more, more, more data

16:48

as you throw more data in, then it's harder

16:50

to keep it clean. You've got more things that

16:52

can conflict. And so, absolutely, and we fought this

16:55

battle in the past where there's, you chase all

16:57

this data because anytime something isn't. seen as valuable,

16:59

the easy thing to default to is to just

17:01

to point to some data that's not clean enough

17:03

or not clean. It may be clean enough, but

17:05

it's never going to be perfectly clean or data

17:08

that's missing. And so that can feed this like

17:10

horrendously vicious cycle where we completely lose sight of

17:12

like, what are we trying to do is get

17:14

as much data as possible. Like the counterpoint is

17:16

those banks could differentiate by... thinking about with way

17:18

less data what their customers really value what they

17:20

most need right and it's not it's not an

17:23

either or but if there is deep

17:25

understanding of their customer and

17:27

they value something, it

17:29

may need very little data.

17:31

It may be using

17:33

data in a different way

17:36

from the already have

17:38

it. So I think there

17:40

has to be that

17:42

balance. I would hope that

17:44

we get to that

17:46

point of like, we can't

17:48

just be in this

17:51

arms race for more and

17:53

more models, more data,

17:55

more, whatever. So, okay, Val,

17:57

unleash. Okay, so my

17:59

visceral reaction. My visceral reaction

18:01

is like, I can

18:04

absolutely see that some people

18:06

would use like what

18:08

you're saying, like the Gen

18:10

AI hype train to

18:12

be like, we need more

18:14

data. I don't think

18:16

that's what Boris is saying,

18:19

but I will obviously

18:21

give you the opportunity to

18:23

speak for yourself because

18:25

like my reaction is, but

18:27

it's not about the

18:29

quantity. It is about the

18:32

quality. Like it is

18:34

not about let's collect more

18:36

data. It's that we

18:38

have, the last few years

18:40

has been all about

18:42

like, let's have fucking data

18:44

lakes. Let's just dump

18:47

data from back end services

18:49

into anywhere and it's

18:51

created, I mean, I think

18:53

we've said a swamp

18:55

before, but it's like, you

18:57

can't ask important questions

19:00

like what do my customers

19:02

value if the data

19:04

that's there is a complete

19:06

trash fire and I

19:08

don't think about quantity. There's

19:10

also this distinction of

19:13

like, it is so easy

19:15

to say, I found

19:17

an error in the data.

19:19

This field is missing

19:21

or this field is incorrect.

19:25

Fix it as opposed to,

19:27

you just said if your

19:30

data is a dumpster, a

19:32

trash fire, there is a

19:34

gradation of which, so put

19:36

aside the more, more, more

19:38

data and bring in the

19:40

pristine data. That point, it

19:42

is so easy to find

19:44

a problem in the data

19:46

and chase that and extrapolate

19:48

from that. So absolutely we

19:51

need proper governance, but you

19:53

can replace either more, more,

19:55

more data which they're absolutely,

19:57

you can Google for it

19:59

and find all sorts of

20:01

articles that say who's gonna

20:03

win or the ones who

20:05

collect all the data. you will find,

20:07

I completely grant you, the data has to be garbage

20:09

in garbage out. I mean, that is like a PAP,

20:11

that may become my next favorite thing to hate on

20:14

after, again, Godly trust all others must bring data. Like,

20:16

it's so easy to say, garbage in, it's like, well,

20:18

people are not pouring garbage in. Yes,

20:20

there are errors. Yes, there is

20:22

process breakdown. Yes, there needs

20:24

to be governance and observability, but

20:27

it is so easy to say.

20:29

that if we're not getting value out,

20:31

oh, it's a data quality issue, and

20:33

now you can get equally obsessed around

20:36

over chasing that. So, Mo, I feel

20:38

like you were putting, you were again

20:40

putting words in my mouth and like,

20:42

well, it's not bad at all. But.

20:44

No, no, no. I just, I think

20:47

sometimes that. like when we're discussing this

20:49

concept there are like extremes and it's

20:51

says the one who said dumpster fire like

20:53

it sometimes is interpreted as a binary

20:55

thing and it's not like I do

20:58

think there is a spectrum it just

21:00

often happens that you're at one end

21:02

of the spectrum and I'm at the

21:04

other end but let me just elaborate

21:07

what I mean by quality because I

21:09

again can see a situation where a

21:11

business goes we must have perfect data

21:13

and that's not what I'm saying I'm

21:16

saying the data has to be meaningful

21:18

so that you can create connections between

21:20

different data sources and that the way

21:22

they relate to each other

21:24

is consistent so that like

21:27

different areas of the business

21:29

are not like tripping over

21:32

themselves making mistakes because

21:34

it's like fundamentally

21:36

so unstructured and so

21:38

like to me it's about how all

21:41

those things connect together. It's not

21:43

just about like is this number.

21:45

to the 99th percent or whatever.

21:47

It's, it's, I don't know, I'm

21:49

gonna just shut up and let

21:51

Bartle because I feel like she

21:53

probably. No, I love this. I've

21:55

been, I love hearing all spots.

21:57

I'm, I'm, yeah, I love it. Well, okay, so

21:59

a couple. One, obviously I'm biased, right? Like

22:01

I have a very data-centric view, but

22:03

I will not for a minute pretend

22:05

that I have nothing but bias, right?

22:08

And I think my bias comes from

22:10

a place of like, yeah, I think

22:12

data is like the most interesting place

22:14

to be in in the past five,

22:16

ten years and in the next five,

22:18

ten. I think it's like the coolest

22:20

party that everyone wants to be a

22:22

part of. And like, they should. And

22:24

you know, I'll continue thinking that, you

22:26

know, I have strong, I, you know,

22:28

wake up every day and choose to

22:30

be part of the data party. And

22:32

I think it's where we're having fun.

22:35

So yes, I'm a 100. I'm a

22:37

100% biased. I agree with you, I

22:39

think data hoarding has been a huge

22:41

issue, a huge problem, and I think

22:43

it's been sort of a strategy that

22:45

has largely failed, like, oh, let's just

22:47

collect all the data and hope that

22:49

it solves, or, you know, think that

22:51

more data is more helpful. It's actually

22:53

interesting. I was just sitting down with

22:55

the founder of a data catalog company

22:57

a couple of days ago, and we

23:00

were talking about how 95 percent of

23:02

the problems that people 95% of the

23:04

questions that people have of data have

23:06

already been answered. And so their challenge

23:08

is just finding the answer and surfacing

23:10

it. There's very, very net new insights

23:12

being created, if that makes sense. And

23:14

so really their challenge is about how

23:16

do we help people or users. discover

23:18

the answer versus create a new answer,

23:20

which is actually mind-blowing if you think

23:22

about like what a small percentage of

23:24

like new incisor generated, like it sort

23:27

of made me a little bit sad

23:29

for like you know the human race

23:31

but also happy that maybe we can

23:33

solve this, but you know I think

23:35

that I digress here, but my point

23:37

is I think what you're, what the

23:39

point that you're making Tim and Mo

23:41

is an important point. I am definitely

23:43

not, I don't think that more data

23:45

is necessarily better. In fact, I think

23:47

there are a lot of areas where

23:49

like less is better and like more,

23:51

you know, precise answers are better. For

23:54

a minute I'm not advocating for that,

23:56

not at all. I think what I

23:58

am saying is most of the, you

24:00

know, if you look at like chat

24:02

GPT or kind of things that like

24:04

anyone has access to. data that everyone

24:06

has access to. Like we can all

24:08

sort of, you know, it's funny, you

24:10

know, people used to say let me

24:12

Google go back for you and I

24:14

was trying to think what's the new,

24:16

like let me perplexity that for you.

24:19

I don't know, it doesn't, doesn't like

24:21

roll it off the tongue just as

24:23

much. Yeah. Well let me ask Claude

24:25

would work, you know, so. Exactly, let

24:27

me ask what Claude says. But I

24:29

think everyone sort of has access. to

24:31

that. But if you have some data

24:33

about your users, right, let's take like,

24:35

I don't know, like a hotel chain

24:37

that's trying to create a personalized experience

24:39

for their users, like no one knows

24:41

as much as they do about, you

24:43

know, I don't know, the like, how

24:46

you like to travel, the kind of

24:48

food you like to eat, the kind

24:50

of, you know, ads that would speak

24:52

better to you. Not that I'm advocating

24:54

for like an ad-centric world, but my

24:56

point is like... The power today and

24:58

where I think the leverage lies in

25:00

is in having things that not everyone

25:02

has access to, the latest and greatest

25:04

LM, so that cannot be your mode

25:06

or your advantage. By no means means

25:08

that we have to have too much

25:11

data or a lot of data. I'm

25:13

not advocating for that, and I think

25:15

it's a very important clarification. I actually

25:17

will say that oftentimes... In the companies

25:19

at least that I work with, one

25:21

of the biggest challenges is that they

25:23

have so much data, they don't even

25:25

know where to get started. And so

25:27

a lot of the work is actually

25:29

saying, let's try to, you know, you

25:31

can think of like layers of important

25:33

data, tier one, two, two, two, three,

25:35

and think about like what's the core

25:38

data sets that we care about, making

25:40

sure that those are really pristine and

25:42

reliable. So oftentimes like actually starting small.

25:44

is the winning strategy. I find when

25:46

companies, you know, when we work at

25:48

the company, company is like, I want

25:50

to observe everything wall to wall. I

25:52

be like, whoa, whoa, hold on. Like,

25:54

you're going to, that's going to be

25:56

really hard. Like, tell me why are

25:58

you actually using all of that data?

26:00

And that strategy often fails. And so

26:02

I'd much rather start with, what's a

26:05

small use case that you

26:07

actually really are using the data

26:09

for, and that's really important

26:11

for users? Let's start with like

26:14

making sure that that's really

26:16

highly trusted and reliable. So I

26:18

agree with you is my point here, and

26:20

I think it's an important

26:22

clarification. Bo, are you gonna? No, I am

26:25

like waiting for the next like rant.

26:27

We can rant by the way. I'm

26:29

happy to rant about garbage in garbage

26:31

out. I think that is a great

26:33

rant. I'm happy to like, you know,

26:35

carry the torch on ranting against that,

26:37

Tim, if you'd like. I don't know

26:40

if you want to share why you

26:42

want to rant. I'm happy to share

26:44

my rant about it. Go for it.

26:46

So I'm curious, Tim, like, when

26:48

I said that stuff about

26:50

like connectivity, What's your views

26:52

on that? Because I

26:54

feel like you can

26:57

only answer important questions

26:59

if the data is, like, kind

27:01

of, I don't want to say

27:03

structured, but I'm thinking about

27:06

like bar's comment of, you

27:08

know, the competitive advantage that

27:11

you have is your data

27:13

set. Like, it's not the

27:16

models, right? So like, how...

27:18

how that all works together then to

27:20

me becomes the most important bit and

27:22

like I really like bars concept actually

27:24

someone in my team did this recently

27:26

the where they went through of like

27:29

what's tier one tier two tier three

27:31

and like I think it's such a

27:33

great framework to help the business understand

27:35

like the different levels of importance but

27:37

like Tim what's your thoughts on like

27:39

that connectivity piece? So one I mean there

27:42

is There is nuance. I try to not

27:44

say things like it all has to be

27:46

connected or it's a dumpster fire or it's

27:48

perfectly pristine. And maybe I fell into it

27:50

a little bit and we chased the more

27:52

and the more and the more. But I

27:54

mean, I would love for there to be

27:57

a little bit more disciplined and nuanced like

27:59

like is. is bar when you

28:01

said like like starting small like

28:03

that is there is no pressure

28:05

no force in business right now

28:08

that says when doing anything with

28:10

your data you should should go

28:12

lock yourself in a room with

28:14

some smart people on a whiteboard

28:17

and then come out with a

28:19

mandate that it's an absolute minimalist

28:21

approach and then you build from

28:23

there because when you say something

28:26

what where And I feel like

28:28

I see this and I see

28:30

it, I mean, I'm spending too

28:32

much time on LinkedIn and reading

28:35

articles that if someone says, this

28:37

is data that we uniquely have

28:39

as a bank or a hotel

28:41

chain, therefore they make the leap

28:44

to we have it, therefore we

28:46

need to feed it in and

28:48

connect it because that is something

28:50

unique to us and therefore it

28:53

provides competitive advantage. And that there's

28:55

kind of a... that's the default

28:57

position is it's our unique data

28:59

we must use it and what

29:02

where I see that going wrong

29:04

is there's a missed step to

29:06

say like really like just because

29:08

we have it uniquely doesn't mean

29:11

it's necessarily valuable if somebody says

29:13

here's why we think it can

29:15

be valuable what's our what's our

29:17

minimum viable product what's our minimum

29:19

way to test that it would

29:22

be valuable But instead it kind

29:24

of is like, there has this

29:26

tendency to say, it's ours, put

29:28

it in the system, make sure

29:31

it goes through that it's pristine,

29:33

which when you flip it around

29:35

to LOM's, like, they're doing stuff

29:37

probabilistically, like hallucinations are coming out,

29:40

all of that's getting better, but

29:42

it's like, even with pristine data

29:44

going in, it's going to give

29:46

kind of inconsistent results. And we're

29:49

kind of like, oh, that's cool.

29:51

Well, it's like, well, then. I

29:53

can't remember who wrote, it might

29:55

have been Ethan Mollick or somebody

29:58

who pointed out, like, yeah. like

30:00

data that's got noise in it,

30:02

putting into something. It's not

30:04

that if you put pristine

30:06

data in, you're gonna get

30:08

a definitive, deterministic answer out.

30:10

If you put pristine data

30:12

in, you're gonna get a

30:14

probabilistic answer out. If you put

30:17

noisy data in, you're gonna

30:19

get probabilistic with a bigger

30:21

range of uncertainty. And I just,

30:23

I think there's just thought and

30:26

nuance to say if you had a

30:28

bias towards. Less and it's not saying

30:30

don't do it. It's just saying move

30:32

with deliberation so that like

30:34

you figure out something is a tier

30:36

one and then you say that's tier

30:38

one. It's a differentiator Lock that in

30:40

and make sure that it is clean

30:42

and when you're connecting it to something

30:44

else. You know, so that's Well, I

30:47

guess that was I was like I'm

30:49

not gonna run about this. I'm gonna

30:51

have a very nuanced thing to say

30:53

and then whoop here it comes

30:55

That was very eloquent. No, that

30:57

was eloquent. But okay, can I

30:59

add some color to the situation,

31:01

right? Like I feel like there

31:04

are some companies that still have

31:06

like a highly centralized model for

31:08

how they store their data or

31:10

how it's built, that sort of

31:12

stuff. Like my world is very

31:15

different to that. Everything's done completely

31:17

decentralized. So like in marketing we

31:19

have marketing analytics engineers and data

31:21

scientists creating data sets and then

31:23

over in the growth team there are

31:25

people creating data sets and over in

31:28

teams in education and like Even if

31:30

you start with that, like, let's

31:32

do something small, it's often created

31:35

in isolation. And the problem is,

31:37

is like, it's really hard to

31:39

answer a cross-cutting business question, like,

31:42

what's important to our customers or

31:44

what to our customers value, when

31:46

everything is built in this like

31:49

completely decentralized model, because like... If

31:51

I take my Tier 1 tables

31:53

and like data sets, that will

31:56

be completely different to another department's

31:58

Tier 1 data sets. like you

32:00

might not be able to answer that

32:02

question. I agree, like just to be

32:05

clear, I totally agree, I love this

32:07

idea of like starting with less, but

32:09

you can only start with less if

32:12

it is, I don't know if the

32:14

right word is like company wide or

32:16

like it's centralized. Like I feel like

32:18

there's this tension in how technology is

32:21

built in some companies. Can I quickly,

32:23

I'm going to admit this is unfairly

32:25

picking on an example that you just

32:28

through that if it's like what do

32:30

our customers value? And it's like, well,

32:32

I have to have all the data

32:34

and hook it all together, or I

32:37

could feel to study and ask them.

32:39

You know, like, there is that, there's

32:41

this story out there of, I'm going

32:44

to plug in, I'm going to launch

32:46

my internet and I'm going to say,

32:48

what are our customers value the most?

32:50

And then through all of this magic,

32:53

it's going to generate it. And you

32:55

say, well, why can it has to

32:57

connect all of this stuff? If that's

33:00

a fundamental question, then there are alternative.

33:02

techniques that have been around for 50

33:04

years, which is usability testing or focus

33:06

groups or panels for some of that.

33:09

That's unfair because you just yank that

33:11

out as one example. So I'm going

33:13

to acknowledge fair point. But yes, I

33:16

agree that there are other research methods

33:18

that would be more appropriate there. Again,

33:20

I'm going to shut up and let

33:23

Bart speak. No, not at all. I

33:25

love this. I feel like I'm asking

33:27

questions that I haven't thought of in

33:29

a while, so that's good. No, I

33:32

mean, listen to this, my reaction is

33:34

a couple of things. One is, you

33:36

know, going back to sort of date

33:39

as being faced with sort of a

33:41

really tricky part of their journey, I

33:43

think. And you talked a little bit

33:45

about sort of what does a great

33:48

model look like for a team? Like

33:50

is it sort of centralized or decentralized?

33:52

And think organizations like go back and

33:55

forth on that. And it also is

33:57

a little bit of like a function

33:59

of the environment in which they operate.

34:01

We work with highly regulated companies who

34:04

operate in a highly regulated environment. So

34:06

think like financial services or health care

34:08

or anything like that. And in those

34:11

instances, they are actually privy to significant

34:13

regulations and audits. And in those instances,

34:15

you really need to have really strong

34:17

data management and data quality controls in

34:20

place. And oftentimes I need to be

34:22

across your entire data estate. And that

34:24

is sort of like a table stakes.

34:27

You can't really operate without that. that

34:29

I think that's very different from you

34:31

know like a retailer organization or retail

34:33

company or you know an e-commerce company

34:36

so you know first and foremost I

34:38

think this is really dependent on where

34:40

what the environment you're operating and also

34:43

what problem are you trying to solve

34:45

when you know when we say data

34:47

products or generative AI applications it's very

34:49

broad and I think if you really

34:52

think about what actually is being used

34:54

there's a couple of things one is

34:56

like creating you know a personalized experience

34:59

for your customers, but it can also

35:01

be inwardly looking for a company sort

35:03

of automating internal operation. So an example

35:05

of Fortune 500 company that we work

35:08

with, they have a goal to have

35:10

their IT organization, 50% of their IT

35:12

work needs to be either completely AI

35:15

automated or AI assisted. That's sort of

35:17

their goal. And that's in terms of

35:19

internally. automating sort of human manual tasks.

35:21

And so, you know, I think it

35:24

sort of depends on what you're trying

35:26

to solve. And I think that's sort

35:28

of what data leaders need to ask

35:31

themselves today. Maybe sort of one thing

35:33

that's coming out of that is I

35:35

think there's this sort of blurring line

35:37

between different people working with data. So,

35:40

you know, in the past, there's sort

35:42

of, you know, you could really draw

35:44

the lines, I think more clearly between

35:47

engineers, data engineers, analysts, data scientists, all

35:49

of that is becoming a lot harder

35:51

to distinguish and I think my view

35:53

is sort of in you know the

35:56

teams that will be building generative applications

35:58

will be a mix of that. So

36:00

it will include both engineering and data

36:03

people. I don't think, I think, you

36:05

know, how does this work? Like someone

36:07

wakes up with a data company and

36:09

is like, hey, CTO, go build a

36:11

generative application. And so like a bunch

36:13

of engineers like run off and build

36:16

something. And then someone's like, hey, CDO,

36:18

go build a generative application. And so

36:20

like a bunch of engineers like run

36:22

off and like build stuff. And so

36:24

you end up having data teams trying

36:26

to build stuff. But at the end

36:29

of the day, like a strong journey

36:31

of AI application or any data product

36:33

needs a good UI, which should be built by

36:35

software engineers. Like, you're not

36:37

gonna, like, that's not the data

36:39

team's job. And it also needs, like,

36:41

good data pipelines and reliable pipelines. And

36:44

that doesn't make sense. Like, you don't

36:46

need, you know, a front end engineer to

36:48

build, like, a data pipeline. And so I

36:50

think at the end, there will be some

36:52

convergence of, like, What the rules are

36:54

but right now there's a there's

36:57

a lot of people sort of

36:59

crossing lines and lots of real

37:01

lines in between and what's your

37:03

perspective on? Data products being more

37:05

as like a platform product like

37:07

versus I don't know I feel

37:10

like there's been There are many

37:12

kind of ways you could cut

37:14

it, right? Like sometimes data products

37:16

seem to sit more in like

37:18

a marketing technology space or whatever,

37:20

but like it seems at the

37:23

moment there is kind of a lot

37:25

of perspective about it really sitting in

37:27

like that product platform sphere and like

37:29

product pms are quite different as well

37:32

to like a customer facing product manager.

37:34

Yeah, I mean, I think if you

37:36

look at like the product, oh, go

37:38

for it Tim. Well, I just want

37:40

to clarify. So when you say a

37:42

platform, are you saying the data product

37:45

is a platform that then gets kind

37:47

of, winds up serving a bunch

37:49

of different use cases? Are you

37:51

saying just where, are you saying

37:53

organizationally? Are you saying what the data

37:55

product is a platform with a bunch of

37:57

features? Like what do you mean by? Yeah.

38:00

say platform product I'm more meaning like

38:02

the products that you build suppose in-house

38:04

that serve as like the platform for

38:06

internal stakeholders and like the tools that

38:08

you're building to service your organization and

38:10

I suppose like as I'm saying this

38:12

out loud I'm like I suppose you

38:15

could have data products that would be

38:17

doing that you could also have customer

38:19

facing data products and those things would

38:21

probably be different oh wow I really

38:23

answered my own question there haven't I?

38:25

No, it's okay. I can elaborate, but

38:28

I think you did. You did answer

38:30

parts of it. So maybe also just

38:32

like to get a step back for

38:34

a second, if you think about data

38:36

products and where they are in the

38:38

hype cycle, like I think this sort

38:40

of like, you know, it's like there's

38:43

this hype and then they plateau and

38:45

then you're like, oh, now I can

38:47

actually make this product. There's like, oh,

38:49

now I can actually really use this

38:51

thing, which is good, I think, I

38:53

think data products can really mean. whatever

38:56

you want. It can both be, it

38:58

could be, you know, let's walk through

39:00

a simple example like an internal dashboard

39:02

that like, you know, the chief marketing

39:04

officer is using every day, right? And

39:06

so it's basically like a set of

39:08

dashboard or a set of reports, and

39:11

then there's a lot of like tables

39:13

with this, you know, followed by a

39:15

particular lineage that feed into that report.

39:17

And so it could be a combination

39:19

of you know, user attributes and sort

39:21

of different information about those users and

39:24

also some user behavior and could be

39:26

a bunch of sort of, you know,

39:28

different third party data sources. And so

39:30

all of that can be part of

39:32

a data product. So from, and you

39:34

can describe that as basically like all

39:36

the assets that are contributing to said

39:39

reporter dashboard that the CMO is looking

39:41

at. My point is. You can basically

39:43

use data products as a way to

39:45

organize your data assets and to also

39:47

organize your users and data teams. And

39:49

so to me, it's less of a

39:52

question of, you know, is this part

39:54

of a platform or not? Because that

39:56

varies, as I mentioned by the organization,

39:58

the size of maturity of the organization.

40:00

For me, it's more a way for

40:02

companies to organize what they care about.

40:04

And so oftentimes, you know, if we

40:07

will work with a data platform team,

40:09

we'll say, hey, like, the data that

40:11

you care about. And then they might

40:13

tell us, oh, you know, we have

40:15

a marketing team. And, you know, that

40:17

really focuses on, you know, our ads

40:20

business. And the CMO there looks at

40:22

this dashboard every morning and they are

40:24

so sensitive to any changes that they

40:26

have there. And so we want to

40:28

make sure that all the data pipelines

40:30

from ingestion, third-party data sources, through transformation,

40:32

through to that report. we want that

40:35

to be very high quality and accurate.

40:37

So we want to make sure that

40:39

that entire data product is trusted. That's

40:41

like one way to think about it.

40:43

Now the ownership of those assets can

40:45

be by the data platform itself or

40:48

it can be by the data analysts

40:50

that are actually running the reports. Oftentimes

40:52

it's a combination of both. So you

40:54

might have data analysts looking at the

40:56

reports, the data platform running the pipelines,

40:58

the toy separate engineering team that's owning

41:00

the data upstream and sort of the

41:03

different sources. And so oftentimes it's actually

41:05

all of them are contributing to sort

41:07

of a set data product, if you

41:09

will. But to me, where data products

41:11

are most useful is in a way

41:13

to organize data assets and organize a

41:16

view of the world for a particular

41:18

domain, for a particular business outcome. if

41:20

that makes sense. Do the data product,

41:22

this is I guess for both of

41:24

you, data product, product managers, like what's

41:26

the breadth, do they go, do they

41:28

engage all the way up to the

41:31

upstream engineering owning the data creation all

41:33

the way through to the to the

41:35

use case and the need or does

41:37

it, like where do, is there a

41:39

natural cutoff where they say? This is

41:41

now this is engineering's problem. They're just

41:44

they need to be managing the data

41:46

coming in or like how how broad

41:48

does that role go assuming it I

41:50

guess maybe there's a precursor question. Does

41:52

that role get defined and exist as

41:54

you are a data product product product

41:56

manager for this data product or set

41:59

of data products? And if so,

42:01

what's the scope of that role? Yeah, doesn't

42:03

it depend on the organization? Like,

42:05

I mean, we're having lots of

42:07

conversations at the moment, because like

42:09

I said, we have a decentralized

42:11

model, which is quite unique, right?

42:13

Because like, well, it's not unique,

42:15

but like, it creates different layers

42:17

of accountability, right? Because like, if you

42:20

have engineers that have a back-in

42:22

service and they're pushing that data

42:24

to you and then you're building

42:26

a data product off it, like...

42:28

The question that comes to mind

42:30

for me is like who's accountable?

42:32

Well, like, it's not an easy answer in that

42:34

model. I think it's a responsibility

42:36

of the teen that are in the

42:38

back-end service to make sure that the

42:41

data is getting pushed correctly. out, but

42:43

then likewise for the people who are

42:45

receiving it, like they have layers of

42:47

accountability as well as the people who

42:49

are using that data, but like in

42:52

a completely different model where you don't

42:54

have that, like you have a more

42:56

centralized model, those lines of ownership could

42:58

be different, right? And so I think

43:00

it's so dependent on the on the

43:03

company and how they're structured to understand

43:05

where something starts and ends.

43:07

I think it's probably... impossible

43:10

to think that a data

43:12

product PM would own everything

43:14

completely end to end. Like,

43:16

I can't envisage a world

43:19

where that would happen just

43:21

because there are so many

43:23

different parts of the bit.

43:25

Like, I don't know. Anyway, I'm

43:27

not making a lot of sense

43:30

now. Yeah, yeah. I mean, this

43:32

is a maybe, you know, not

43:34

what you'd want to hear, but

43:36

I think it depends answer. Like

43:38

it depends on the maturity of,

43:40

I mean I don't want to repeat

43:42

what Mo said, but I strongly agree

43:44

with that. It's hard to draw the

43:47

lines. I think some of the teams

43:49

that do this better are those

43:51

that are able to have like

43:53

a strong data governance team that

43:55

can actually sort of clearly sort

43:57

of lay out what that looks

43:59

like. You know, the most common model

44:01

is something like a federated model where

44:04

you have a centralized data platform, like

44:06

what you said, Mo. The centralized data

44:08

platform sort of defines what excellence looks

44:10

like, what great looks like. And so

44:13

they might define like, these are the

44:15

standards for security, quality, reliability, and scalability.

44:17

And so whenever you're building a new

44:19

data pipeline or adding a new data

44:22

source, you need to make sure that

44:24

it passes these requirements on each of

44:26

those elements. And so in that way,

44:28

like the centralized data platform defines what

44:31

great looks like. And then no matter

44:33

what team you're on, this could be

44:35

the data team serving the marketing team

44:37

or finance team or sort of whatever

44:40

use case it is. We adhere to

44:42

the same requirements that the centralized team

44:44

has defined. So we see a lot

44:46

of that. I think that's, again, with

44:49

generative AI, we will see more of

44:51

that because maybe going back to sort

44:53

of what we said at the very,

44:55

very beginning of the call, how we

44:58

use data 10 years ago was a

45:00

lot simpler. There were very few use

45:02

cases and very few people using data.

45:04

the need for a centralized, you know,

45:07

sort of governance definition is more important.

45:09

I mean, this is also, you know,

45:11

you kind of see this, I think

45:13

the sort of, you know, LLLM or

45:16

generative AI stack is still being defined,

45:18

but, you know, one of the questions

45:20

you raise this, Tim, was, you know,

45:22

allucinations are very real, right? And, you

45:25

know, when you release a product and

45:27

the data is wrong, you know, you

45:29

know, colossal... impact both on your revenue

45:31

and your brand. You know, maybe the

45:34

example that I like to give them

45:36

the most is, I don't know if

45:38

you all saw this, sort of went

45:40

viral on Twitter or X. I'm not

45:43

going to get used to that thing,

45:45

but it went viral on X. You

45:47

know, someone did this thing on Google,

45:49

like basically the prompt was something like,

45:52

what should I do if my cheese

45:54

is slipping off my pizza? was like,

45:56

oh we should just use organic superglue.

45:58

And, you know, the... Oh wow! It's

46:01

obviously a bad answer, right? And honestly,

46:03

I think Google can get away with

46:05

it because of such strong brand that

46:07

Google has these days. And so, yeah,

46:10

I'll probably continue to use Google even

46:12

though they gave me a shit answer

46:14

about like organic super glue from my

46:16

pizza. But most brands, if I'm, you

46:19

know, an esteemed bank or an airline

46:21

or a media company, I can't afford

46:23

to have... those kind of answers in

46:25

front of my users. And so like

46:28

actually getting that in order is, you

46:30

know, again, Google can get away with

46:32

it, but like 99% of us cannot.

46:34

Nice. I want to switch gears just

46:37

a little bit and talk about something

46:39

else that kind of obviously ties in,

46:41

but also kind of reintroduces a lot

46:43

of challenges, which is unstructured data. And

46:46

going into next year, one of the

46:48

articles I was reading that you'd written

46:50

bar was kind of like saying low

46:52

is going to be one of the

46:55

things could you kind of give a

46:57

perspective about okay so we're going to

46:59

be using a lot more unstructured data

47:01

but then doesn't that how do how

47:04

do we then take all the things

47:06

we've just been discussing about how challenging

47:08

data is and now we're just going

47:10

to slam on now a new set

47:13

of challenges on top of that they're

47:15

going to kind of re-do the whole

47:17

thing like what do what do people

47:19

do about this? We should do at

47:22

some point like a 2025 will be

47:24

the year of and see see see

47:26

what we come up with I don't

47:28

know if it'll be a little round-rovin.

47:31

Yeah, exactly. You asked Claude, I'll ask

47:33

perplexity, U.S. chat GPT, please. Yeah, exactly.

47:35

Exactly. I mean honestly if like if

47:37

we could foresee that we probably wouldn't

47:40

be in this business right we'd be

47:42

doing something else if we could be

47:44

forecasting that but I think as will

47:46

2025 be the year of unstructured data

47:49

I don't know but I can tell

47:51

you this for the last 10-15 years

47:53

most of the data work has been

47:55

done with structured data and structured data

47:58

is very easy. It's like, you know,

48:00

data that's like in rows, columns, tables

48:02

that you can analyze in a pretty

48:04

straightforward way with a schema and most

48:07

of like the modern data stack and

48:09

whatever solutions that we all use in love

48:11

on day-to-day has been focused on structured data.

48:13

That being said, if you look at where

48:16

the growth is, I think there's like, you

48:18

know, some crazy estimates from Gardner. you know,

48:20

like 90% of the growth in data will

48:23

come from unstructured data, something

48:25

like that, or, you know, and

48:27

just to define when, you know,

48:29

when we talk about unstructured data,

48:32

things like text, images, etc.

48:34

Well, 80% of that unstructured

48:36

data will be generated by an LLLM, so,

48:39

no, I'm... You know, it's turtles all the

48:41

ways, like, you know what I mean. you

48:43

know, I think the former founder of open

48:45

AI, it's something like we're at the peak

48:48

data of AI now, right? Like we're at

48:50

the time, we're like, this is the most

48:52

data that we have to train, and

48:54

from now on, we're going to have

48:56

to rely on synthetic data in order

48:58

to do that. So, you know, and

49:00

that goes back to your question of

49:03

like hoarding data. But going

49:05

back to the unstructured point,

49:07

I think, you know, unstructured data

49:09

is becoming more and more

49:11

and more and more important.

49:13

how to do with it. You know, I think this is

49:15

very early days for this space and I think

49:17

we're still sort of watching and kind of understanding

49:20

what's happening. But I think one of the

49:22

things just to make this really concrete with

49:24

an example, I think is a cool example.

49:26

You know, we work with a company that's

49:29

a Fortune 500 insurance company and one

49:31

of the most important types of data

49:33

for them, unstructured data, is actually

49:35

customer service conversations conversations.

49:38

So like, let's say, you know, I have a

49:40

policy or something that I'm upset with and I

49:42

want to chat with someone and then have this

49:44

conversation and you know you can analyze that

49:46

conversation to understand my sentiment to you know

49:48

how pissed off am I like am I

49:50

like yelling representative rep like I don't know

49:52

I'm like getting my manager or whatever it

49:54

is or you know I'm like super happy

49:56

thank you so much right like that's what

49:58

I mean by saying sentiment. So you

50:00

can sort of analyze like what is

50:03

a conversation like and and basically you

50:05

know you can also ask the user

50:07

for feedback right like sort of scoring

50:10

that. One of the things that this

50:12

customer does actually uses LLM to create

50:14

structure for this unstructured data. What do

50:16

I mean by that? They basically take

50:19

a conversation and then score that conversation.

50:21

So like zero to ten, this conversation

50:23

was a seven or an eight or

50:26

something like that. Now what's the problem?

50:28

The problem is that sometimes OLM hallucinate

50:30

and they might give a score that's,

50:32

let's say, larger than 10. What does

50:35

that mean if a score, if a

50:37

conversation score at a 12, for example,

50:39

right? So actually, like, the way in

50:42

which we were working with this company

50:44

is allowing them to observe the output

50:46

of the LLM to make sure that

50:49

the structure data is within the bound

50:51

of what a human would expect to

50:53

score an unstructured data, which is the

50:55

customer conversation. And so in that instance,

50:58

we're sort of using automation a way

51:00

that we maybe hadn't expected before in

51:02

order to add value and to sort

51:05

of, you know, in this instance, is

51:07

actually like reduce the cost and improve

51:09

the experience for the users in this

51:11

case. But it's one of those, that

51:14

brings up the case of say that

51:16

it just, that scoring that model, it

51:18

just, it shits the bed 10% of

51:21

the time, but it does way better.

51:23

60% of the time and it does

51:25

about the same as a human and

51:27

its overall A little bit cheaper like

51:30

I think that there are there are

51:32

the the tradeoffs and I mean, maybe

51:34

this goes back to earlier The discussion

51:37

that if it's like well, we're gonna

51:39

pull out the one that it said

51:41

at 12 and say You got to

51:43

fix that from happening. That's one approach

51:46

make this never happen the other option

51:48

is It's going to happen. So the

51:50

process needs to be human in the

51:53

loop or human on the loop. Like

51:55

don't don't completely hand this over so

51:57

that you can catch the ones because

51:59

a human would catch it and they're

52:02

the tradeoffs are, and you know what,

52:04

maybe they're even, you know, it's okay.

52:06

You're gonna have a small percentage who

52:09

are totally pissed off, even if you're

52:11

just running humans, because their wait time

52:13

is too long or something else. Is

52:15

your goal to have every customer have

52:18

a delightful experience? It may be a

52:20

different set of customers that are having

52:22

a horrible experience and then probably mode

52:25

if you're connected. You want to make

52:27

sure the ones with the highest predicted

52:29

lifetime value. You're not saying, great, we

52:31

have way fewer customers are pissed off.

52:34

Unfortunately, it tends to skew towards the

52:36

ones that are the highest, you know,

52:38

lifetime value. So, I think that's, yeah,

52:41

I mean, I think that's spot on.

52:43

And I think it's, I mean, one

52:45

of the questions that I remember sort

52:47

of thinking through thinking through thinking through

52:50

thinking through is like, like, like, like,

52:52

like no answer, like no answer, like

52:54

no answer, like, like, like, like, like,

52:57

or a bad answer. You know, and

52:59

I'm not sure. I can tell you,

53:01

we're not creating, you know, sort of

53:03

agents, if you will, in order to

53:06

say, oh, I don't know, right? That's

53:08

not how you create them. But oftentimes,

53:10

like, that actually might be the better

53:13

answer. I think Tamash, to include, you

53:15

know, sort of collaborated with on, you

53:17

know, predictions for next year. So to

53:19

us, like, you know, what you'd expect

53:22

is like 75 to 90% accuracy is

53:24

considered like state of the art for

53:26

AI. However, what's often not considered, I

53:29

mean, on the face of it, 75

53:31

to 90% seems, you know, really legit

53:33

and reasonable, but what's not, what's not

53:35

considered is like, if you have three

53:38

steps and each of 70 to 5

53:40

to 90% of accuracy, the combination of

53:42

that is actually ultimate accuracy of only

53:45

50%, which is, by the way, like,

53:47

worse than the high school student would

53:49

score in that sense. And so is

53:51

50% acceptable? Probably not. And so what

53:54

ends up happening. is actually what I

53:56

think we were seeing in Mark is,

53:58

is like, the market actually took this

54:01

big step back. Like I think a

54:03

year ago, there was this huge rush

54:05

to adopt a run of AI and

54:07

to try to build solutions. But as

54:10

we were seeing that the accuracy is

54:12

sort of, you know, at those ranges,

54:14

companies did take a step back and

54:17

actually are reevaluating or rethinking where to

54:19

place their bets or chips, if you

54:21

will, I still find that most companies.

54:23

evaluate a solution with a

54:26

human thumbs up or thumbs down like

54:28

was this answer good or not in

54:30

allowing users to just mark like yep

54:32

this was great or no this kind

54:35

of sucked companies still have that

54:37

and I don't think we're moving

54:39

away from that you know unless

54:41

there's sort of big big change

54:43

in in the near future. I

54:45

have a totally unrelated random question

54:47

bar with the companies you're

54:49

working with is the focus of

54:52

reliability and the work you do

54:54

quite different depending on whether data

54:56

is structured or unstructured like in

54:58

the use case you just gave

55:00

like it sounded like it was

55:02

quite different but like what are

55:04

you seeing across the industry? Yeah

55:07

100% like I think the use cases

55:09

that we cover very tremendously

55:11

based on industry and company

55:13

and I think that's a

55:15

reflection of the variability in what

55:18

you can do with the data across

55:20

the industry. So it can range, you

55:22

know, the types of products that

55:24

we work with can be, you

55:26

know, data products that are more

55:28

like a regulatory environment

55:30

where in, you know, one mistake in

55:33

the data could actually put you at

55:35

risk of regulatory fines. You know, if

55:37

you are using data in some incorrect

55:39

way, or not following what is defined

55:41

as sort of best practices for data

55:44

quality, sort of like this blanket statement

55:46

that's very high level, but actually like

55:48

is very important in these environments. That's

55:50

like one. The second can be where

55:52

you have a lot of internal data

55:55

products, so you know, like a lot

55:57

of reporting or you know, product organizations

55:59

that are... you know, doing analysis based

56:01

on cohorts or segmentation of your user

56:03

base, you know, a third could be

56:05

data products that are sort of customer

56:08

facing. So for example, if we have

56:10

like, you know, the easiest thing that

56:12

is like a Netflix, you know, recommends,

56:14

you know, your next best view, for

56:16

example, and then a third, I guess

56:18

a fifth use case could be, you

56:21

know, a generative data application. So for

56:23

example, like an agent chat bot that

56:25

helps you ask questions and answer about.

56:27

you know, your internal process or your

56:29

internal data. So you can ask really

56:32

basic questions like, you know, how many

56:34

customers do we have? And, you know,

56:36

how many customers have renews in the

56:38

last few years? Or if I'm in,

56:40

if I'm in support, I can ask

56:42

how many support tickets has this customer

56:45

submitted in the last year and in

56:47

what topics and, you know, what was

56:49

their C-SAT, sort of questions like that.

56:51

And so these. Each of these can

56:53

include structured or unstructured data, and each

56:56

of these can cover very, very different

56:58

use cases and very different applications of

57:00

the data. So if anything, I see

57:02

the sort of more less homogenous sort

57:04

of applications of the data, if that

57:06

makes sense. And I actually anticipate that

57:09

this will carry through to the generative

57:11

AI stack. So, you know, there's people

57:13

create software. In a multitude of different

57:15

ways, in a multitude of different stacks,

57:17

the same can be said for data.

57:20

There's not one single stack that rules

57:22

at all. There's not one single type

57:24

of data that rules at all in

57:26

order to create data. I think the

57:28

same will be true for generative AI.

57:30

There's not one single stack or one

57:33

single preferred language of choice. And there's

57:35

not one single preferred method, whether it's

57:37

structure data or unstruct data. I think

57:39

this does very much. sort of vary.

57:41

I will say from my biased point

57:43

of view is the thing that is

57:46

common sort of going back to like

57:48

the foundation of truth and sort of

57:50

what is very important is like every

57:52

organization needs to have or needs to

57:54

rely on their enterprise data. and make

57:57

sure that it's high quality trusted data

57:59

so that they can actually leverage and

58:01

capitalize on that. And I think it's

58:03

a messy, messy route to get there.

58:05

Maybe 2025 would be the year of

58:07

messiness. Sometimes you just gotta like lean

58:10

into the messiness, you know. You know,

58:12

on our like path, like this random,

58:14

you know, random path to kind of

58:16

figure it out. But there's a lot

58:18

more to figure it out there, but

58:21

I don't see us sort of converging

58:23

on like one single path or use

58:25

case or even type of data. All

58:27

right, we've got to start to wrap

58:29

up. This is so good. And yeah.

58:31

Oh, we figured it all out. So

58:34

we're good to wrap. We can before.

58:36

Yeah, exactly. 2025 will just be the

58:38

year of leaning into the mess. And

58:40

maybe that's the best we can do

58:42

right now. Anyway, one thing we love

58:44

to do is go around the horn,

58:47

share last call, something might be interesting

58:49

to our audience. Bar, you're our guest.

58:51

Do you have a last call you

58:53

want to share? Sure. So this concept

58:55

that someone has shared with me recently,

58:58

which they'll call sort of watching the

59:00

avocado, if you will. So I don't

59:02

know if you experience this, but you

59:04

know, you buy an avocado and it's

59:06

like, it's not ready, not ready, not

59:08

ready, boom, you're too late. It's already

59:11

like you can't eat it anymore, right?

59:13

That happens to you, right? And so,

59:15

you know, I think the idea is

59:17

like a lot of sort of new

59:19

technologies and trends are like that. And

59:22

in this case, sort of this is

59:24

like, generative AI. Like, we're too early,

59:26

we're too early, we're too early, boom.

59:28

You know, you miss the boat. And

59:30

so I think one of that, you

59:32

know, things that I take away from

59:35

that is like as data leaders, as

59:37

sort of data practitioners, how do we

59:39

keep watching the avocado? We've got to

59:41

hit the avocado before it's too ripe.

59:43

But the timing matters here, especially for

59:46

a lot of these sort of trends

59:48

and technologies. Nobody likes bad guacamole. The

59:50

business who now uses that when they're

59:52

talking somewhere internally, if they use the

59:54

analogy, please let us know. I want

59:56

to, I like that. We got to

59:59

watch the avocado. Yeah, it's awesome.

1:00:01

All right, Mo, what about you?

1:00:03

What's your last call? Okay, I've

1:00:05

been doing lots of thinking

1:00:07

about how I make 2025

1:00:10

really great. And I think

1:00:12

one of the tensions I've

1:00:14

found is that like I'm

1:00:16

naturally inclined to like want

1:00:18

to go fast and get

1:00:20

to the place that I want

1:00:22

to get to. And so this is

1:00:25

not anything other than just

1:00:27

Kind of a personal learning or

1:00:29

a personal goal that I've set

1:00:31

for myself. It is the start

1:00:33

of 2025 after all and that

1:00:36

I want to be more intentional

1:00:38

about enjoying the journey and The

1:00:40

analogy I have is I love

1:00:42

going to the beach going to

1:00:44

the beach with two small humans

1:00:46

is really fucking hard. There's all

1:00:48

this shit to pack. You've got

1:00:50

a carted old down there. Everyone

1:00:52

needs sunscreen on like And so sometimes

1:00:54

the bit of getting to the beach

1:00:56

is so unpleasant that by the time

1:00:58

you get there, you're all like flustered

1:01:01

and hot and you don't want to

1:01:03

be there and you're like, oh, fuck

1:01:05

it, let's all just go home. So

1:01:07

I'm trying to enjoy the journey to

1:01:09

get there more. So like, I went

1:01:11

to the beach the other day, it

1:01:13

took us an hour to get there.

1:01:15

My kids wanted to stop at this

1:01:17

playground, they wanted to look at the

1:01:19

bird, like, they wanted to have a snack.

1:01:22

lean into letting, enjoying the bit to

1:01:24

get there and not focusing so much

1:01:26

on kind of the end state. And

1:01:28

it's not just about kids, it's also

1:01:30

about work, right? Because like, if you're

1:01:33

constantly trying to like come up with

1:01:35

this huge amazing strategy and deliver this

1:01:37

project, but like you're miserable in the

1:01:39

months delivering it, that kind of, you

1:01:41

know, defeats the purpose. So anyway, that's

1:01:43

just my intention for the year that I

1:01:46

share. What about you, Tim? Well, my

1:01:48

publisher is gonna hurt me if I don't. Plug

1:01:50

analytics the right way. So if you're

1:01:52

depending on when you're listening to this,

1:01:54

it is less 15 or fewer days

1:01:56

from actually being available, but analytics the

1:01:58

right way is available. for pre-order until

1:02:01

January 22nd, in which case it

1:02:03

will be available as a printbook

1:02:05

or an e-book and the audio

1:02:07

books coming out four or five

1:02:09

weeks after that. So that does

1:02:11

have a section talking about human

1:02:13

in the loop versus on the

1:02:15

loop versus out of the loop

1:02:17

and some of the AI tradeoffs,

1:02:19

but it is not an AI

1:02:21

heavy book at all. So that's

1:02:23

my obligatory self, my log rolling

1:02:25

last call. For fun, I will,

1:02:28

I've definitely last called Stuff from

1:02:30

the Pudding before, but one that

1:02:32

they recently had, it's at Pudding.

1:02:34

Cool, but it was Alvin Chang,

1:02:36

got a data set that looked

1:02:38

at a whole bunch of different

1:02:40

roles. and it was how much

1:02:42

they spent of their time sitting

1:02:44

versus standing. So it's kind of

1:02:46

one of those like scrolling visualizations.

1:02:48

You enter kind of some stuff

1:02:50

about your job first, so it

1:02:53

can then kind of locate you

1:02:55

on it. But it's just a

1:02:57

simple x-axis from that goes from

1:02:59

sitting all the time for work

1:03:01

versus standing all the time for

1:03:03

work, and then it looks at

1:03:05

a whole bunch of different. It

1:03:07

varies what the y-axis is as

1:03:09

you scroll through it. So it's

1:03:11

kind of just a fun visualization.

1:03:13

how tough on bodies a lot

1:03:15

of our professions are because they're

1:03:17

required to crouch or stand all

1:03:20

the time. They can't take breaks

1:03:22

and that sort of thing. But

1:03:24

it's just kind of a fun

1:03:26

interactive visualization. So worth checking out

1:03:28

to Robax. What about you, Michael?

1:03:30

What's your last call? I mean,

1:03:32

it was gonna be the book.

1:03:34

Tim, I was. I was actually

1:03:36

ready to do one on the

1:03:38

book for you just in case

1:03:40

you didn't cover it so good

1:03:42

job. We'll report back to your

1:03:44

publisher you're doing it. You're doing

1:03:47

what you can do? No. So

1:03:49

actually mine is recently recast who

1:03:51

I think is some of the

1:03:53

best in the game when it

1:03:55

comes to Media Mix models. They've

1:03:57

started publishing a series of YouTube

1:03:59

videos on how to think through

1:04:01

the creation of those models and

1:04:03

I think it's a great watch

1:04:05

for anybody who's engaging with that

1:04:07

kind of data so I'd highly

1:04:09

recommend it and they've put a

1:04:11

couple out already and then I

1:04:14

think there's some more to come

1:04:16

so that would be my last

1:04:18

call all right so what is

1:04:20

2025 the year of I would

1:04:22

just have one word everybody has

1:04:24

to go around and do like

1:04:26

a one word it's or like

1:04:28

a fast No, no, nothing. Moderation.

1:04:30

I think, I think 2020, yeah,

1:04:32

there you go. I think 2025

1:04:34

is going to be the year

1:04:36

of being thoughtful, keeping with the

1:04:39

work, increasing insights, maybe helping with

1:04:41

process. None of that's actually going

1:04:43

to happen, but I just sort

1:04:45

of like wish it were. So

1:04:47

that's my take on it. So

1:04:49

you use the one word for

1:04:51

all of us. You just, you

1:04:53

kind of took. we all deferred

1:04:55

or well nobody answered Tim so

1:04:57

I just figured we were not

1:04:59

gonna I yielded my one word

1:05:01

to you so yeah I like

1:05:03

it so I couldn't think of

1:05:06

a better person to help us

1:05:08

kick off 2025 with then you

1:05:10

bar thank you so much for

1:05:12

coming on the show is been

1:05:14

awesome absolutely I hope 25 2025

1:05:16

will be you know even better

1:05:18

and greater than 2024 and you

1:05:20

know I would probably be remiss

1:05:22

if I wouldn't say that 25

1:05:24

would be the year of highly

1:05:26

reliable data and AI. That's right.

1:05:28

What's a saying from your mouth

1:05:30

to God's ears and whatever that's

1:05:33

though we absolutely would want that?

1:05:35

Amen. Thank you so much. Awesome.

1:05:37

Thank you so much for coming

1:05:39

on the show again. And of

1:05:41

course, no show would be complete

1:05:43

without a huge thank you to

1:05:45

Josh, Crowherst, our producer, just getting

1:05:47

everything done behind the scenes. As

1:05:49

you've been listening and thinking about

1:05:51

2025, we'd love to hear from

1:05:53

you. Feel free to reach out

1:05:55

to us. You can do that

1:05:58

via our LinkedIn page or on

1:06:00

the Measure Slack chat group or via

1:06:02

email at contact at analytics

1:06:04

hour.io. We'd love to hear your

1:06:06

thoughts. Other things that you think are

1:06:09

big topics for 2025 in the world

1:06:11

of data and analytics. So once again,

1:06:13

Barr, it's a pleasure. Thank you so

1:06:15

much for taking the time. We really

1:06:18

appreciate having you on the show again.

1:06:20

And you know, you're on track now.

1:06:22

We keep talking about the Five Timers

1:06:24

jacket. That's gonna be a thing. So

1:06:27

you're in the running. There's only been

1:06:29

a few people that done this a

1:06:31

couple of times. Are you prepared to

1:06:33

have five kids, I guess is the

1:06:36

question. Like, we may need to break.

1:06:38

Anyway, so of course, I think I

1:06:40

speak for both of my co-host, Tim

1:06:42

and Mo, when I say, no matter

1:06:44

where your data is going, no matter

1:06:47

the AI model you're using,

1:06:49

keep analyzing. Thanks for listening.

1:06:51

Let's keep the conversation going

1:06:53

with your comments, suggestions, and

1:06:55

questions on Twitter at at

1:06:57

Analytics Hour on the web

1:06:59

at Analytics Hour.io, our LinkedIn

1:07:01

group, and the Measured Chat

1:07:03

Slack group. Music for the

1:07:05

podcast by Josh Crowhurst. So

1:07:07

smart guys want to fit

1:07:09

in, so they made up

1:07:11

a term called Analytics. Analytics

1:07:14

don't work. Do the analytics

1:07:16

say go for it no matter

1:07:18

who's going for it? So if

1:07:20

you and I were on the

1:07:22

field the analytics say go

1:07:24

for it? It's the stupidest

1:07:27

laziest lamest thing I've ever

1:07:29

heard for reasoning in competition

1:07:31

So my yeah, my smart

1:07:34

speaker decided to weigh in

1:07:36

on that I love it. What

1:07:38

did they have to say about that? Yeah. It's

1:07:40

the perfect little end note to that

1:07:42

particular. Yeah. Yeah. And Tim, probably be

1:07:44

a few minutes for you. Yeah. Her

1:07:46

thumbs down that. And the background was

1:07:48

saying, nope, I don't think I can.

1:07:50

Actually, it basically said I don't know

1:07:52

now that I think about it. It

1:07:55

was like, whatever it decided it hurt,

1:07:57

which was nothing. Yeah. Perfect. Perfect. Rock

1:08:00

flag and and lean

1:08:02

into the the mess!

Rate

Join Podchaser to...

  • Rate podcasts and episodes
  • Follow podcasts and creators
  • Create podcast and episode lists
  • & much more

Episode Tags

Do you host or manage this podcast?
Claim and edit this page to your liking.
,

Unlock more with Podchaser Pro

  • Audience Insights
  • Contact Information
  • Demographics
  • Charts
  • Sponsor History
  • and More!
Pro Features