#253: Adopting a Just In Time, Just Enough Data Mindset with Matt Gershoff by The Analytics Power Hour | Podchaser

Episode from the podcastThe Analytics Power Hour

#253: Adopting a Just In Time, Just Enough Data Mindset with Matt Gershoff

Released Tuesday, 3rd September 2024

Good episode? Give it some love!

#253: Adopting a Just In Time, Just Enough Data Mindset with Matt Gershoff

#253: Adopting a Just In Time, Just Enough Data Mindset with Matt Gershoff

Tuesday, 3rd September 2024

Good episode? Give it some love!

Rate Episode

Podchaser Pro

Episode Transcript

Transcripts are displayed as originally observed. Some content, including advertisements may have changed.

Use Ctrl + F to search

0:05

Welcome to the Analytics Power Hour. Analytics topics covered conversationally

0:10

and sometimes with explicit language. Hi everyone, welcome. It's the Analytics

0:15

Power Hour and this is episode 253. Almost every time I've attended the

0:23

SUPERWEEK conference in Hungary over the past seven years, a major theme

0:28

is how much our industry is changing. And lately, especially with privacy

0:33

regulations, new laws that impact our industry. And the other thing I usually

0:38

take from that conference are new ideas about where the industry's heading

0:42

and how we adapt to these changes. And I think

0:45

this conversation on this show will be similar, I think, in a lot

0:49

of ways. And with new constraints on how, when, and where we collect

0:54

and store data, it's high time to embrace new paradigms where we can

0:59

find new ways of thinking about data collection and usage in a privacy

1:05

first world. So let me introduce my co hosts, Julie Hoyer.

1:09

Welcome. Hey there. That's awesome. And Tim Wilson, who has been with me

1:14

many times at SUPERWEEK. Welcome. Thought we weren't gonna talk about that.

1:18

It's on video. What happens at SUPERWEEK stays at SUPERWEEK? Stays at SUPERWEEK.

1:23

Well yeah, we won't talk about a lot about it. All right. And

1:29

I'm Michael Helbling. And our guest today needs no introduction, but let

1:33

me do a little bit. He is the CEO of Conductrics, an amazing

1:37

thinker and speaker, and is our guest again for the third time.

1:40

Welcome to the show, Matt Gershoff. Thanks for having me. A real honor.

1:45

It's awesome. I'm thankful to have you too. And actually, as I was

1:48

thinking about it, I was like, well, you've been there at most of

1:50

these SUPERWEEKs as well. And your company Conductrics sponsors that event.

1:54

And I remember that very fondly. I mean, seeing you there all those

1:59

years, it's also really fun. But one thing, Matt in this topic specifically,

2:06

you consistently in our industry, for me anyway, are usually one of the

2:11

people who sort of about five years ahead of what a lot of

2:16

people are talking about. And so I think it's really interesting that one

2:20

of the things you're really talking about now around sort of this world

2:24

of new privacy laws and things like that is about adopting a mindset

2:28

of just in time or just enough data or sort of privacy first

2:33

mindset around data. And so kind of maybe as a starting point,

2:37

what got that going for you when? And what kind of spurred that

2:41

as sort of a major area of thinking and writing for you over

2:45

the last couple of years? Sure. Well, first, thanks for having me.

2:48

And thanks for having me, Tim and Julie. This is

2:52

gonna be fun. Looking forward to it. Well, actually just to step back

2:56

a little bit, the work that we've been looking into

2:59

and working on within Conductrics around privacy engineering and data minimization

3:06

is really less about privacy per se, and really more about

3:12

thinking about why we're doing analytics and experimentation in the first

3:19

place. And so I think for us, we have a slightly different view

3:25

of the value of experimentation. And just so that the listener understands

3:30

where I'm coming from, is that Conductrics is in part an experimentation

3:35

platform where you might do A/B testing and multi armed bandits and that

3:40

type of thing, where you're trying to learn

3:42

basically the marginal efficacy of different possible treatments.

3:46

And for us, we really feel like the value of experimentation

3:54

is that it provides a principled procedure for organizations

4:00

to make decisions intentionally, to make them explicitly,

4:05

and to consider the trade offs between competing alternatives. And ultimately,

4:11

the reason for doing this is to act as advocates, sort of the

4:15

front line for the customer. And so we have a much more,

4:21

I guess, hospitality or omotenashi approach to why experimentation, why

4:29

one really should be doing experimentation. And I think that's true of analytics

4:33

more generally. It's like, really, why are we doing it?

4:37

And I think one of the issues that I've seen in the,

4:41

I don't know, almost 25, 30 years that I've been in the analytics space

4:45

is that sometimes analytics tends to become, kind of lose that focus.

4:53

And we tend to have programs that become

4:57

almost ritualized. So we sometimes start doing behaviors

5:02

just to do them, and we kind of lose the focus of

5:06

really why and what the ultimate objective is. And so for us,

5:11

part of the reason why privacy engineering and data minimization is something

5:18

that we've gravitated towards was, one, part of that is really about respect

5:22

and being customer focused. But also, two is that it really forces one

5:29

to think intentionally. And we ask the question,

5:33

what is sort of the marginal value of the next bit of data?

5:38

Like, why should we collect this next piece of data or the added

5:42

data? And to really have some sort of editorial and expertise about why

5:48

we might be getting more information about the user when we might not

5:53

really need it in the first place. And so this idea of intentionality

5:56

is really what underpins both experimentation for us as well as why we

6:03

were interested in moving towards having a more

6:09

data minimization approach to the experimentation platform. So you said

6:13

sort of the ritualized behavior, which you came up with,

6:18

as I recall, you sort of came up with two and then you added a third. You said, oh,

6:24

there are these mindsets of data, I wanna get data just in case,

6:31

just in case I need it. And that, I think, falls under that

6:33

kind of ritualized behavior, gather all the data,

6:39

not considering the incremental value of it. And you contrasted that with

6:43

just in time. And then you added like just enough, I think,

6:47

a little bit later. But does that fit that we're kind of making

6:53

a broad generalization in analytics? And I think even in experimentation,

6:58

there's a tendency to say that next bit of data,

7:03

the cost to collect it is near zero. So let me collect it

7:07

just in case down the road. And that just is kind of ballooned out

7:12

that you add on a million additional data points. And now you're just

7:16

in the habit of just collecting everything and sort of

7:20

lost the idea that you're actually trying to figure out what you're doing

7:24

with it. Yeah, that's a good question. That's a good comment.

7:28

Really what it is that if you think about it,

7:32

the GDPR and data privacy, most of that conversation has been around compliance.

7:38

Which, and what you can't do. And a lot of that is really

7:42

sort of a procedural thinking, like do you follow certain procedures for

7:46

risk mitigation? And really what I think the privacy legislation is really

7:52

about is to encourage privacy being embedded in technology, being embedded

7:59

in processes by default. It's not that you shouldn't collect

8:04

data if it's required. It's not that if you have a task

8:07

and you need the data in order to achieve the task,

8:11

no one's saying that one shouldn't collect that. It's really about asking

8:15

for a particular task, whether or not the data

8:19

is pertinent. And it's about being sort of respectful to users and not

8:22

collecting more than that's needed. Now that privacy by default

8:27

is in contrast to what I think a lot of the thinking had

8:32

been or currently is in sort of analytics and data science,

8:36

which is really a data maximalist approach, which is

8:41

collect everything by default. And again, as you say, the sort of the

8:45

marginal cost of the next level of granularity, right? So we can think

8:51

of more data as being finer and finer levels of granularity for any

8:56

particular data element, or it could be additional data elements

9:02

and it can also be additional linkage. And so that's sort of that

9:06

whole 360 and so that every element or event can be traced back

9:11

or associated with an individual. So you kind of have those three dimensions

9:16

to expansion of data. And so I was really trying to point out

9:21

is that a lot of that data collection

9:26

is somewhat mindless. It's just that just in case and underpinning it,

9:30

is it really an explicit objective, right? We're not, we don't have a

9:34

particular task and we're collecting data for this particular purpose. Like

9:40

in an experiment, I was talking about just in time is because we

9:44

have the task. I need to know whether the marginal efficacy of one

9:49

treatment over another, one experience over another. And so then I need

9:52

to go out and collect data for that task

9:56

versus just in case it's really, I don't really know what the question

10:00

is that I'm gonna ask, but I'm gonna collect it anyway.

10:04

Now, why am I gonna collect it? Well, really there's sort of a

10:07

shadow objective, which is one based upon magical thinking, which is

10:13

all of the value is in that next bit. It's almost like the

10:17

gambler who's at the table when they're losing and they just have to

10:22

believe that the next hand is where the big giant payoff is.

10:26

That often gets rationalized in data science and venture land is sort of

10:33

fat tails, right? And so there's some sort of huge, there's huge payoffs

10:37

out there lurking in the shadows and you just need to have reached some

10:41

sort of threshold of critical mass in order to achieve it.

10:46

And I'm not saying that that doesn't exist, but it's unlikely that it

10:50

exists in the probabilities that people think. So that's one side of things,

10:57

which is this magical thinking that all the value is in the data

11:00

that I haven't collected. And then secondly, it's about minimizing regret.

11:04

So it's like, well, I don't wanna not have collected it in case

11:08

I need it in the future. My boss asked for it.

11:11

And so we collect it. And that's sort of collection by default.

11:15

And that is not consistent with the privacy by default. And that's really

11:24

the law. And so that's not to say, though, that discovery

11:30

isn't something that's also important. So it's not about being paternalistic

11:34

and saying, don't collect data or there's a certain way that you have

11:39

to do it. Really, all we're talking about is just being thoughtful about

11:43

it and being intentional. So it's like, hey, I think perhaps that if

11:47

we had the company may think or you folks may think that,

11:50

hey, from this particular company or a client, if they had X data,

11:54

then they could solve tasks A, B, C, D, X and Z,

11:59

whatever. And that seems totally reasonable to me. Then you have a reason

12:05

to go collect that data and then check, Okay well, does it look

12:07

like this data is informing these decisions or helping us make decisions?

12:12

But that's entirely different than just collect everything.

12:16

And I think that just in case collect everything, one, it being mindless,

12:21

there is no objective to having it other than to have it, really

12:25

opens organizations open up to grift. The sales pitch, which is

12:31

can you afford not to collect it? A lot of that stuff.

12:35

And that's prevalent in our industry. And so I really think

12:39

it's really about being mindful. And it's really about

12:44

this idea that the real value is not in the data or in

12:48

any statistical method or any technology. It's really in the editorial

12:53

and the expertise and really the taste. It's like, does the company have

12:56

taste to be thinking about what is gonna be useful for their customers

13:01

and to be cognizant of what the customers need or have empathy for

13:05

them and to be using information about them in a way that's respectful?

13:09

That's really all, that underpins all of this.

13:16

It's time to step away from the show for a quick word about

13:19

Piwik PRO. Tim, tell us about it. Well, Piwik PRO has really exploded

13:24

in popularity and keeps adding new functionality. They sure have. They've

13:28

got an easy to use interface, a full set of features with capabilities

13:33

like custom reports, enhanced e commerce tracking and a customer data platform.

13:38

We love running Piwik PRO's free plan on the podcast website,

13:42

but they also have a paid plan that adds scale and some additional

13:45

features. Yeah. Head over to piwik.pro and check them out for yourself.

13:50

You can get started with their free plan. That's piwik.pro. And now let's

13:55

get back to the show. Well, it's funny, too, that

14:00

working with a lot of clients that do the just in case collection,

14:04

because, again, it is widespread. It's the norm across the industry,

14:06

I would say. I have run into so many situations where we go

14:11

and they ask a very important business question and we start with like that

14:14

question first and then they say, and we have all this data that

14:17

we can pull in and we have so much we should be able

14:19

to answer this. No problem. And time and time again, I start getting

14:23

into like the actual requirements of what the data needs to be able

14:26

to do to answer this great question. And then we find out that

14:29

even though just in case they've been collecting all of it,

14:31

it's not in the right structure or things can't be joined the right

14:35

way, whatever it is between the tool and the actual data structure itself,

14:39

we can't answer the question they care about. And so it would still

14:42

be then defining in that moment going forward, like, what do we actually

14:46

need to be collecting for you to answer this business question?

14:50

And it's funny because one of the examples I had was actually working

14:53

in Adobe Analytics, or actually Adobe CJA. And we were bringing in a

14:58

data set from, let's say, like Salesforce. And I started to have this

15:02

conversation with my stakeholders saying, you're asking great questions,

15:06

but you're asking questions that we're used to being able to ask the

15:10

data that would come in through Adobe that we were used to for

15:12

years with Adobe Analytics. And now you have this data coming in from

15:16

Salesforce, which was structured and designed to answer different types

15:20

of questions. And so they don't map perfectly together. And so now we're

15:24

starting to talk to them about how could we rework this and actually

15:28

bring in the data in a way to answer the questions you care

15:31

about and that your stakeholders coming to you actually need.

15:36

Yeah, the main thing is to be intentional. Now, but to be fair,

15:38

like some of those companies that you've mentioned in the past,

15:42

they were sort of masters of this collect everything

15:46

and magical stuff is gonna happen. And then all of the use cases

15:50

wound up being error handling because the site was broken. And so

15:56

that's not really a community that has been

16:01

totally innocent of maybe overselling collecting data. I mean, data is not

16:09

information. And I think it's important to think about

16:13

kind of like the entropy of what you've collected, like how compressible

16:19

is the data? And so a lot of times you have data,

16:23

but it's not information. It doesn't help you reduce uncertainty in a particular

16:31

question that you're asking. And that's what information does. And just

16:35

because there's bits being collected does not mean there's more information.

16:41

Well, and it feels like my concern is that it's already a problem.

16:46

It already is the, and you said it was kind of the laziness

16:49

of avoiding thinking of saying, well, just collect everything. I mean, the

16:52

number of times that I've got experiences where somebody said,

16:57

oh, the data collection requirements are pretty straightforward. Just collect

16:59

everything. And it's like, well, no that's lazy and simple for you to articulate.

17:04

It's actually showing that you're not thinking through what you're going

17:07

to do. I feel like we've been in that mode with lots of

17:12

forces sort of pushing that idea, that idea of I wanna have the

17:18

option to look at this data and hopefully it's structured well

17:23

with the, a chunk of the world of AI and

17:28

the next generation of the technology vendors jumping on that train or kind

17:33

of spinning the, well, to do AI, like the more data,

17:37

the better. And there, we're running out of data already to train the

17:41

models. And I'm afraid that's pouring kerosene on a raging,

17:48

poorly functioning fire already that now people get to wave their hands

17:52

and say, I'm doing this for the future of AI. It's just like

17:56

the next level of a lack of intentionality of

18:00

surely if I get even more data, then the AI will be able

18:05

to kind of run through it. But it's really just amplifying,

18:08

I think the same problem that you articulated when

18:13

very clear and concise questions may mean that you need to collect a

18:21

very small amount of data for the next

18:25

month, as opposed to you've got boatloads of data you've captured for the

18:30

last five years that actually aren't that helpful,

18:33

but you're gonna force yourself to go wade through that, trying to do

18:36

something that if instead you had intentionality and said, I'll just go

18:39

forward, like having that historical data, it actually makes it harder to

18:44

have the discussion of what's the best data to collect just enough of

18:51

just in time to answer that question. Oh, that's that new data.

18:55

And it's like, well, new data, what are you talking about?

18:58

We have this ocean of data. What can you do with that?

19:03

Well, what I can do with that is a much more complicated,

19:05

messier, actually less good at answering the question.

19:10

But yes, we're checking off the box that you can point to

19:14

your just in case mindset is having, helped me answer a question.

19:18

It actually wasn't the best way to answer the question in many cases.

19:22

Yeah, and I get so many times like, what can, just do what

19:25

you can do with the big messy historical data that we just in

19:28

case captured when I tell them like, oh, well to really answer this,

19:32

yeah, maybe it should be different data looking forward in a test.

19:36

And they're like, eh, yeah, well, we don't wanna do that. So what's the best you can give us from the other stuff?

19:40

Yeah, and just to be fair, I didn't use the word lazy.

19:44

I just think maybe just unaware. Yeah, I mean, I just think it's, I

19:50

think the value is in being aware and being explicit. That's what I

19:54

think data teams and companies should be doing.

19:58

And I think that's where the success is. And it's not in doing

20:02

analytics. It's analytics in the service of having

20:08

a well thought out understanding and model of

20:12

the customer and the environment that you're in. But this, again,

20:15

this isn't to be paternalistic and saying, I don't know, it's not for

20:19

me to say what companies in particular context should be doing or shouldn't

20:23

be doing. I just know for us, when we re architected the software

20:28

back in 2015, we were aware of GDPR, and we read up on

20:33

privacy by design, which are principles, I think came in mid '90s by

20:39

Dr. Ann Cavoukian, I believe. And there's seven main principles. And the

20:44

GDPR and other privacy frameworks have incorporated those principles into

20:52

their legal frameworks. And one of them is principle two, which is privacy

20:58

by default. And so, and I think principle three or four might actually

21:04

be by embedding. And this idea is that the software and systems

21:09

should have these, should be privacy by default, by design, and it shouldn't

21:13

be like a bolt on. And so customers should be able to use

21:16

the services by default in a privacy preserving way.

21:21

And it's really only in cases, you need to like move up from

21:26

the default as opposed to the current approach, which is collect everything

21:30

and moving down from that. It's really inverted and it really should be,

21:35

you should be collecting as little as possible to solve the task.

21:37

And we just realized that actually experimentation at least, and I'm not

21:42

saying everything, but at least in experimentation, many,

21:47

if not most, and actually most of the tasks in A/B testing experimentation

21:52

can be done following a data minimization principle, which means we really

21:58

do not need to link all the information together. We do not need

22:01

to collect IDs. And we can store data in what are known as

22:07

equivalence classes. You can kind of think of that as like a pivot

22:10

table. And so the data is stored at basically an aggregate level.

22:15

But even though the data is stored in an aggregate way,

22:19

which allows us to use ideas from privacy approaches such as K anonymization,

22:26

we can talk about that if that's of interest, we kind of use

22:29

ideas of K anonymity to help the client A, be able to audit

22:35

what data has actually been collected in a much more efficient way.

22:39

So it's very easy to know what you have and whether or not

22:41

it's in breach of any privacy guidelines you might have.

22:45

But also it means that we can do

22:48

the analysis in a much more computationally efficient way.

22:53

And so there's a lot of nice benefits from

22:57

following or embedding privacy by design principles into your systems and

23:03

procedures, which are beyond just having less data about the individual.

23:10

The main thing is that it encourages this idea of intentionality, just being

23:14

aware of what you're collecting and why. But that doesn't mean it's appropriate

23:18

in all cases. That's not what I'm saying here. It's just more of

23:22

an option. Well, and Matt, because I've now read and seen you talk

23:27

about this, like it kind of blew my mind a little bit when

23:31

it sort of clicked. And I think it was an indication of how

23:35

sort of stuck in the standard way of doing things was that when

23:39

it comes, if we just talk simple A/B testing on a website,

23:43

and we know that we need to know, let's just go with A

23:46

and B, that we've got, that you're treated with A, you poke around

23:50

on the website some more, you convert or you don't convert,

23:53

store row. Your B, you poke around on the website, you convert,

23:57

maybe you don't convert, and the amount. And it seemed like,

24:01

well, obviously, you have to have every one of those rows.

24:05

And then when you're done, you just kinda,

24:09

you pivot it and you compare the conversion rates and you gotta do

24:12

some other little t test kind of math.

24:15

And what kind of blew my mind is you were like,

24:20

well, wait a minute, what if instead you just incremented

24:24

counters? Because that step that I just glossed over of saying,

24:28

I've taken 10,000 rows of individual users and rolled them up so that

24:33

I could do the actual calculations that are done behind the scenes,

24:38

you were like, well, wait a minute, if what you need is a count, you

24:42

can just increment how many people got A, how many got B.

24:45

If you need the sum of how many converted, you don't have to

24:51

have all those rows, you can just increment a counter and say,

24:54

you're A, I need to track you in the session long enough to

24:58

increment the counter, I don't need to store a whole row,

25:00

I just need to increment a counter. And then where I really counted was

25:03

like, oh, and then if you need sum of squares, I can square

25:08

each value and then do the sum. 'Cause like, so, like you're literally

25:14

getting from, you have what was 10,000 rows and it winds up being

25:19

two rows that you're just incrementing. And that was kind of

25:23

your point saying, I can do, I can give you all the results

25:27

that you get from a standard A/B testing platform

25:31

in a standard basic A/B test. And that's just one scenario,

25:35

but I didn't gather even IDs. I just had to have in a

25:39

very limited temporal way until I could log

25:44

which class they went in and what the result was. And I can

25:48

just keep incrementing that. So one, did I state that fair?

25:52

Like that, if the listeners are like, what is he talking about anonymization.

25:56

Yeah, I don't wanna, yeah. So yeah, I don't wanna get in too much like, because this is like, this is gonna, I don't wanna lose

26:00

the listener here in too much minutiae here. But just to, but yes,

26:04

you're right. And so really, the realization was,

26:08

and what some of the listeners I'm sure are aware of,

26:11

but some may not be. To be fair, you headed down the K anonymization path

26:15

before I tried to do my summary. So

26:19

I don't want to be like, oh Tim, oh Tim, you're getting too detailed in

26:22

the weeds. No, we're getting, yeah, no, and really let's blame Julie because

26:26

we said beforehand that she was supposed to keep us from.

26:31

But just at a high level, it turns out that actually,

26:36

what underpins most of the analysis or an approach to mostly analysis of

26:43

the tasks that folks in experimentation need to do,

26:48

is really, is regression. It's like least squares. I'm not gonna go into

26:52

like, we don't have to go into like how it's done and all

26:55

that stuff. But it turns out that one is able

26:59

to do a regression analysis, do various regression analyses on data that

27:05

has been stored in equivalence classes in a certain way.

27:09

So the main takeaway is that we can store data in an aggregate

27:15

way such that we can do the same analysis as if we had the

27:20

data or most of the same types of analysis as if we had

27:25

the data at the individual level. And so

27:29

what are the types of tasks that we can do? Well,

27:32

as you said, we can do t tests which is sort of like

27:34

the basic frequentist approach for doing an experiment when we're kind of

27:37

trying to evaluate the treatment effect and try to account for the sampling

27:42

error. But also things like multivariate analysis and ANOVA analysis of

27:48

variants which you might do for multivariate tests. You might be doing something

27:53

like interaction checks. So maybe you have some sort of, like Conductrics

27:57

has some sort of alerting system where we're checking between different

28:01

A/B tests whether one A/B test might be interfering with another.

28:05

Underneath the hood, that's really for the folks who know some stats in

28:10

your listener base, it's really just doing like a nested partial F test

28:13

between two regression models, a full model and a reduced model.

28:17

All of those things can be done and even. I was gonna say

28:19

that, but I was trying to keep it up a little high level. It's just

28:21

more than T tests and even, there's like a lot of buzz and

28:27

I think exaggeration around things like Cupid, which is really

28:34

regression adjustment in the experimentation space. Even that

28:38

can be done on aggregate data. Now, the main point about it being

28:44

aggregated is really about data minimization, which is one, reducing the

28:49

cardinality of any data field, which is the number of unique elements that

28:53

we might wanna store. So instead of storing

28:58

the sales data, the pre sales data of the user

29:01

from some arbitrary precision of cents, maybe it makes sense to have it

29:06

in some sort of 10 bins that represent sort of the average value

29:11

of each bin. So from zero to 10, where the average value in

29:16

the 10 bin is like $1,000 or something. So the main idea is

29:21

to reduce sort of the fidelity and sort of down sample some of

29:26

the data that you're collecting so that you have less unique elements

29:31

within each data field and to collect fewer data elements and maybe to

29:36

decide when you wanna co collect elements. So

29:40

one can collect the data such that, let's say there's 10

29:44

segments, types of segment data that we might wanna collect within the experiment.

29:49

We can store those as 10 separate tables so that you can do

29:53

10 separate analyses or you can have them stored, you can collect them,

29:58

co collect them. Maybe we wanna have these two or three collected at

30:01

the same time or maybe up to 10. As you add,

30:05

you co collect data, you increase the joint cardinality, the number of unique

30:11

combinations and that's the thing that you kind of wanna manage.

30:15

It's like how many unique combinations of segment information do we wanna

30:21

collect? And the measure that we might wanna use is the number of

30:27

users that kind of fall within each one of those groups,

30:30

each of those combinations. And maybe we wanna have at least 10 users

30:35

that fall into each one of those combinations such that we're never really

30:40

collecting data on any individual user, we're collecting data on collections

30:45

of users who look exactly the same. And so that's really that idea

30:50

of K anon is how many other people look exactly the same in

30:56

the data set. And so you might wanna have some sort of lower

31:00

bound on that, say five or 10. And that's a good way to

31:03

measure, it doesn't provide privacy guarantees, but at least it's a good

31:08

measure to be aware of how specific or the resolution of the data

31:16

you're collecting about each individual. I like what you're saying. I think

31:21

one of the challenges that I'm thinking of right now and maybe it's

31:25

just dumb, but I feel like a lot of organizations lack

31:30

the underlying knowledge to start making those groupings or buckets

31:35

in the first place. And then sort of my question is sort of

31:39

then how do they get that level of information or knowledge to be

31:43

able to take that next step? Or is it they feel emotionally like

31:48

they're making the buckets, they're like, but buckets are less precise.

31:51

I need to be more precise. And that's just the right, that's the. I feel

31:54

like, that's going back to the first thing, which is sort of like

31:56

our nature is to just try to glom on to every piece of

31:59

information possible. But like there's just people with a lack of knowledge.

32:03

So let's say somebody said, hey, I'm gonna fight my instincts

32:06

to try to do this privacy by design. And now what I need

32:10

to do is I need to group users like the way you just

32:12

described to do K anonymization. How do I know how to set those

32:16

up so that they're gonna be realistic? Well, how do you know the

32:21

data you collect? I mean, first of all, you're making the decision at

32:24

a certain level of granularity anyway, like that's implicitly being done.

32:28

Secondly, again, I just wanna step back. This isn't the main, the main

32:33

takeaway here really is about just at least being thoughtful about it.

32:37

It may be that you don't change your behaviors at all.

32:39

Maybe totally fine. And in the whatever context someone is working in,

32:44

it may be appropriate. One use case is like, let's say you're in

32:48

a financial organization or healthcare where there is, you're in a regulated

32:56

industry or you want to have some sort of,

33:02

you have to collect the data anyway, let's say

33:05

that is private data, but you wanna do analysis.

33:09

There's this idea of sort of global and local

33:12

privacy that really comes from differential privacy. A global privacy is

33:17

where you have a trusted curator, right? And so

33:24

you have the data. Think, a good example of this would be the

33:26

US government and the census. So the data that's collected by the census

33:31

is extremely private information about citizens. And when that data is released,

33:38

it needs to be released in such a way that private information about

33:41

any individual is not leaked. And so in that case, the trusted curator

33:48

is the census bureau, but they have a mandate to release information for

33:53

the public. And so you could be in a situation where

33:57

you're an organization that has this information and you wanna do analysis.

34:01

So you might wanna release data to your analyst team

34:06

of the private data that has been privatized in some way.

34:10

And so one would be to use data minimization and this sort of

34:14

idea of K anon. But there's other approaches. There's differential privacy.

34:18

And so that's something I know, I just spoke at the PEPR Conference,

34:22

which is a privacy engineering and respect conference. And like there's

34:26

Meta is there and Google is there and whatnot. And they often have

34:30

situations where they collect data and they wanna do,

34:33

you build tools or analytics on it. But they release internally data that

34:37

has either been subject to differential privacy or various data minimization

34:42

principles. So that's one of these. Can you define, can you, how easy

34:46

is it to give a high level explanation of what differential privacy is and how it works. Well, I'm not an expert on

34:52

it and it's not super easy. But at the high level,

34:57

as far as I understand it, it's essentially,

35:01

I believe it's the one approach that actually provides privacy guarantees.

35:07

So you actually have a particular privacy guarantee around it. And the main

35:11

idea is that you inject a certain known amount of noise into the

35:19

data. So the data is perturbed by a certain quantity of noise,

35:25

which is defined by a, what's known as a privacy budget.

35:30

So basically you inject noise. It's usually either Laplacian noise or Gaussian

35:35

noise into the data set such that when a query comes back,

35:41

it's a noisy result. And so it essentially has certain guarantees that

35:48

any individual, you have a difficult time differentiating between

35:52

two data sets, one that has an individual in it, particular individual on

35:56

it, and an adjacent data set that's the same, except it does not

36:00

have that individual in it. And whether or not the query results are

36:05

consistent with or without that individual. And so

36:09

that is probably terribly unclear to the listener, but the main idea is

36:13

that you inject noise, you inject noise into the data set.

36:17

It's actually quite complicated. And at first it looks like amazing.

36:20

We took a look at it and we were thinking about doing it.

36:22

And I believe the census now is using differential privacy and it is

36:28

useful in a situation where you need to release a lump of data.

36:34

You need to release one particular query, like the census and they release

36:41

the results and they've applied a differential privacy mechanism to it.

36:50

It gets a lot more complicated when there's a lot of ongoing queries

36:53

on the data because there's a privacy budget and there's this idea of

36:56

composition, simple composition, advanced composition. It's somewhat related,

37:01

actually it's deeply related to Pearson Neyman hypothesis testing actually.

37:05

And so these ideas about inflation of type one error rates and all

37:09

that stuff is not completely dissimilar to the idea of consuming privacy

37:14

budget and whatnot. And so it's not clear to me how one would

37:17

actually manage it in an organization and two, whether or not organizations

37:22

would accept noisy data. People kind of freak out about that.

37:25

But there is this trade off of course, between privacy and

37:29

utility. But again, the interesting bit, I think the takeaway is one,

37:36

privacy by default is the law, at least in Europe and to various

37:41

degrees in different states. And what I found

37:46

can be often frustrating is that most of the privacy conversation is around,

37:51

again procedure and compliance. It's like you can't do this. And it's like

37:57

not productive. It's like, well, what, like help, give me some tools to

38:01

think about what we actually can do. Like if you care about outcomes.

38:06

And what is, I think of interest for the listener might be is

38:10

to look into privacy engineering, which is really

38:14

more a community and approaches about design based thinking to build systems

38:19

that have properties, privacy properties in them. And that gives a way forward

38:26

to actually build stuff and to build stuff that has these privacy properties

38:32

as part of them, as opposed to what I feel a lot of

38:36

the privacy conversation is about not doing stuff and people trying to like

38:41

block you from doing anything, very sort of bureaucratic in its approach,

38:45

very legalistic. And this is a much more engineering approach and really.

38:49

This whole conversation that we're having is really just about providing

38:54

an example of a company that has applied these privacy engineering principles

39:01

to their software. Now it's really gonna be up to everybody else to

39:04

decide when and where it's appropriate for them, but it is a way

39:09

to actually build stuff as opposed to just

39:13

not being able to do anything. So it's interesting, I never read the

39:18

seven principle, the Privacy by Design seven Principles until

39:22

prepping for this episode. And you, because you bring up principle number

39:25

two a lot, but principle number seven is the respect for user privacy

39:29

and keeping the interest of the individual uppermost. And I feel like that

39:34

may be a cudgel that I start swinging around like I... Watching on

39:40

LinkedIn, is people are posting these diatribes. If you're not taking your

39:44

first party data and pumping it into this other system and giving it

39:48

to that, what are you doing? This is insane. And it's,

39:53

you quickly watch the comment thread. Some people say, yeah, I use my

39:57

tool to do that. You have other people arguing about the logistical complexity

40:02

of doing it. And then there's like a tiny little thread that is

40:05

saying, is that in the individual's best interest? Like everything about

40:11

that. Sometimes it is. I think you were using an example earlier that

40:15

if you need data from somebody in order to provide them something that

40:18

they want, it is in their interest to provide it. But that feels like another whole

40:24

tranche of the MarTech industrial complex that... There is nothing about

40:30

that principle number seven of keeping the interest of the individual uppermost,

40:37

which I think is another piece of that, that maybe just a little

40:41

another hobby horse I can mount and gallop around on. Yeah.

40:45

Well, seven and two I bring up mostly because it's privacy as the

40:50

default. That's key. I think that's the key bit is that it should

40:54

be the default. And I definitely think, one should not be getting their

41:02

guidance from the marketing tech industrial complex. Like that's a problem

41:09

because there's perverse incentives there. That industry is incentivized

41:14

to push, collect everything and magical thinking like people will sell a

41:20

magic box if people wanna buy a magic box. And I think that's

41:25

the antithesis, I think of being thoughtful and mindful about why you're

41:30

doing something. Unless the optics of buying a magic box have value,

41:34

that's okay. I don't... It's not for me to judge like, what is... Why

41:40

you're doing something? It's just one should have thought about why they're

41:43

doing something. But it feels like this way of thinking will end up

41:46

being more productive for people long term though. Because we are,

41:51

to your point, going to continue to run into

41:55

restrictions privacy wise. And I think people that are still holding onto

41:59

this idea that I have all this historical data and if I can

42:03

just look backwards and answer any question and understand each individual

42:06

and watch their entire path through my website, I'll be able to answer

42:09

any question, I need to make any decision about the business.

42:13

But it feels like if someone could let go of some of that

42:16

baggage of the way the industry and the story's always been told to

42:19

us. That you can start by saying like, what is the best question

42:24

to answer right now for the business to make a decision moving forward?

42:27

And what's a way to actually ask that and answer it looking forward

42:30

again by doing experimentation rather than trying to do a very complex historical

42:35

analysis. And then you can go about actually designing and engineering the

42:39

data again, moving forward. And I run into this so much with my

42:43

clients where I do feel like you just get stuck in the cycle

42:46

of looking backwards. That it is refreshing to hear that this is

42:51

tactical steps and way of selling that forward thinking mindset instead.

42:57

And seeing that it could be really freeing for probably a lot of

43:02

companies. I don't think it has to be experiments. I think you could

43:06

even have stuff that if you're not tracking something and they're like,

43:09

well what's going on here? It's like, well, we could just keep a

43:12

counter, we're at our a physical store and somebody saying, well, we wanna

43:16

know how many people are looking at... How many people look at produce

43:20

versus toilet paper. And one option would say, well we gotta have cameras

43:25

mounted. So we've tracked all of that so we can answer it just

43:28

in case if you ask that question. Or if all of a sudden

43:32

that becomes a very important question to answer, say,

43:36

cool, we're gonna take all that money. We didn't invest in this super

43:39

complicated tracking system that had to store everything and we're just

43:43

gonna, send some resources. It's gonna take me two weeks to answer the

43:47

question, but very, very precisely. 'Cause I know exactly

43:51

what you're looking at and it may not be even an experiment.

43:55

It does seem like a... It is such a radical

43:59

shift, like a change in... I'm not optimistic that we're gonna be able

44:03

to affect that sort of a shift because there are a lot of

44:08

pressures that don't want it. And it's to Matt, I think your point,

44:13

it's so easy to get sucked into the compliance mindset for privacy.

44:18

Well, what do I, my default is everything, what do I have to

44:22

turn off or what layers do I have to put on

44:25

so that I'm backsliding at a slower rate from what I'm used to

44:29

doing as opposed to or... And you hit it quickly this, the simplicity

44:36

of the computation. Well there's a simplicity of if you have no data

44:41

and you have a really clear question and you say, what's the minimal

44:44

data I need to collect to answer that question? That in many cases

44:48

becomes a lot simpler for a lot of the questions. Now the problem

44:53

is, you're leaving a few questions that you could have answered otherwise,

44:56

I guess. And this isn't, and just to be clear you're not tied

44:58

to the old way they were collecting it. So many times you ask

45:02

a good question and the data they have in that topic is not

45:05

in a way you can even use it. So I love though that this frees you up to say, how exactly do I need the data

45:10

to answer the question instead of, again, you're married to the baggage

45:14

of what's already been done. And they're like, well, I spent a lot

45:16

of time and money and effort. So you gotta figure out how to

45:19

use it. Also... That's a great point. And also, just to be clear,

45:23

this isn't like Gershoff's point, this isn't like me, this is like,

45:29

it's encoded in the law. That's what... It's Gershoff's law. No. Yeah. It

45:33

has nothing to... It is now. 100% It's not like I'm bringing this

45:37

to the table. It's like that's privacy by design is embedded in things

45:43

like GDPR, article 25 in principle five, 5C I think. So it's not

45:49

like I am suggesting that people do this special thing.

45:54

It's really, this is what's out there. This is part of the expected

45:59

behavior, especially at least in Europe, I guess. And what are some ways

46:05

that we might wanna think about it and, oh yeah,

46:08

also it is, I think supports this idea,

46:12

which I think is really the main point from my perspective.

46:16

Is that the value of... The value is not in this technology.

46:20

It's not in our software or other company software.

46:24

It's not in any statistical method or in the analytics method.

46:28

It's really about being thoughtful about what it is you're trying to do

46:33

and being thoughtful about what the customer might care about and being

46:38

explicit about how you're allocating resources and then thinking about things

46:42

at the margin. And a nice added benefit of thinking about datamisation in

46:48

privacy engineering is that it is consistent with thinking that way.

46:54

That's really the main thing. I think that's what's nice about it is

46:58

that it helps us think through and be,

47:02

have clarity about why we're doing stuff. What you wind up doing

47:07

is not for me or any of us to say it's really gonna

47:10

be ultimately for everyone in whatever context they're in.

47:14

That's all. It's really just calling that out that

47:17

we can actually have sort of outcomes. One of the... It's not gonna

47:22

be my last call, but it's Jennifer Pahlka who wrote Recoding America.

47:30

There's a really good podcast with her on Ezra Klein his podcast.

47:36

And I think she has great clarity on where she talks about

47:41

procedural thinkers and outcome based thinkers. And I think that's a really...

47:47

She kind of frames it in a way that I think about all

47:51

the time and a lot of privacy conversation is really procedural.

47:54

It's like, have you followed this process? Have we have we hit the

47:59

check marks? Yeah. Great. But it's sort of like, it doesn't tell you

48:03

how to do anything. It doesn't tell you about how to improve your

48:07

outcomes, whereas the privacy engineering side of things is really outcomes

48:10

based. It's like, how do we actually do stuff? And I think

48:14

the one thing that is the theme that runs through analytics and marketing

48:20

analytics specifically is about outcomes. We really should be caring about

48:24

outcomes and actually being productive. You can say that it's not you saying

48:30

this, but as you're saying that, I think you're pointing it out,

48:34

but if you look at all of the hand ringing around

48:40

GDPR and different kind of privacy legislation in Europe, and then they're,

48:46

oh, these countries are saying that their interpretation is Google analytics

48:50

is not valid. As soon as that sort of becomes

48:55

the debate, it becomes the regulators don't understand

49:00

digital and that's not reasonable. And let us rationalise why

49:06

the way that we're doing things is fine.

49:10

So that then, that just sucks all the oxygen out of the conversation

49:14

is what's the ruling gonna be as to whether this platform is allowed

49:19

in this region based on this argument. And it feels like it just

49:26

by default moves four steps away from the underlying

49:30

intent and the principle and then has a debate kind of in the

49:34

wrong space. Where you're pointing out that like, no, no, no,

49:38

where it started is valid and let's not rip it away from there

49:44

and go have an argument somewhere else that's already missed the point.

49:48

Yeah. And you don't have to be part of that argument.

49:50

That's like... You don't... That's a decision that you make. Like is that

49:58

what you care about? It's not what I care about. And so

50:02

we just wanna make good product and that's respectful of our users and

50:07

is consistent with some of these principles. And it has some nice benefits

50:12

and we're just, I'm chatting with you all right now is really like

50:16

here A is an example, and then also B. Again,

50:21

making sure we just don't just mindlessly collect data. Now there's a reason

50:26

to push back on that is that privacy or data minimisation is the

50:31

default. And so you make that what you will. It's really gonna be

50:36

up to everyone else, but I think it's valid just to sort of

50:40

point it out. But yeah, there's a lot of nonsense out there,

50:43

Tim. So what? There's a lot... There's... I mean if you're getting your

50:51

information from LinkedIn primarily what's LinkedIn? It's like a lot of

50:55

people like self promoting their stuff and people like, are they really

50:58

experts? You look at it, a lot of people aren't

51:01

and there's a lot of nonsense multipliers. There's a lot of agencies out

51:06

there. People just, you gotta step back and think about what the perverse

51:10

incentives are and there's a lot of perverse incentives out there and

51:15

a lot of folks are selling product and are selling services.

51:20

And what is new often is something that they can use to sell.

51:24

And I just think by being, again, I don't overuse the word intentional,

51:29

but just being thoughtful and mindful is a

51:34

protection against acting in a way that isn't rational and you can

51:40

bump up what they're saying to see if it's sort of consistent

51:43

with what your actual needs are. And again, I sell software and so

51:47

people can be... I have my biases as well and so

51:54

I'm well aware of that. But again, this is stuff that is not made up by us, by me.

52:02

It's kind of the law and just a way of thinking about it.

52:06

But again, we're not selling, there's no one way to do things and

52:10

we're not being paternalistic about it. It's not for me to say or

52:13

any of us to say how others should... Well you all are some of you're consultants.

52:17

So I guess it is kind of for you to give guidance.

52:20

But it's ultimately... The way we look at it, it's our job to

52:25

give... It's almost like being a doctor and there's various treatments and

52:30

we may have a preference about what we think a type of treatment

52:33

works, but it's ultimately up to the client to think through what are

52:39

the trade offs between different interventions? And does one approach

52:45

work better for them? They are in a better position to know.

52:48

It's just really our job to give them options and ultimately if they

52:54

do something they wanna do an approach that isn't what we would've done,

52:58

that's totally fine. It's not for us to say. It's just our job

53:02

to give, to be acting in good faith and kind of give them

53:05

options. I love that we've got this conversation done now 'cause I think

53:10

we're gonna be referring to it again and again and again over the

53:14

next many years. This is good on a lot of levels

53:19

for a couple reasons. One, because when we start seeing vendors in five

53:23

years talking about this, we'll know where it came from.

53:28

And as we sort of seek out and pursue sort of almost like

53:34

a new set of first principles as analysts around how incorporating privacy

53:39

in a proactive manner works. It's starting at this sort of juncture.

53:44

It's a lot of food for thought. All right. This has been outstanding

53:49

as per usual and thank you Matt. Thank you very much.

53:55

Well thank you so much for having me. It's been a real pleasure. It's good.

54:00

I've got a lot of thoughts going on as I usually do when

54:03

we talk and none of them are very well formed and most of

54:07

them probably don't make any sense. So it's gonna take a while. But

54:10

this is really good and I think I echo what you were saying,

54:13

Julie, which is sort of like, this is the first time I've sort of looked

54:17

at privacy stuff and not felt sort of like this,

54:20

oh, they're just crushing our fun and we have to follow all these

54:23

rules. There's now sort of like, okay, there's a path forward and I

54:26

can get excited about that. Now I'm intrigued and I wanna go learn

54:31

more about how do I incorporate that as part of a central part

54:34

of my path out from here. Which I think is. Yeah. Can I

54:38

just say, I do, to echo that, Michael, I started to feel at

54:44

the very end I was starting to culminate all my thoughts finally into

54:47

something coherent of, I really like that this way of thinking gets rid

54:52

of the fear of feeling like they're losing something with the privacy

54:57

laws out there and the new regulations coming. Because I feel like that's

54:59

what always the conversation is about is we're losing this, we're losing

55:04

that, oh no, you wanna hold on tighter because you feel like things

55:07

are being pulled away from you. But this kind of breaks that fear

55:10

cycle and, yeah, it feels kind of like a new day.

55:13

Like, oh, turn the page. There's a new way to start.

55:16

You can start fresh, it's okay. None of our tools support it yet,

55:20

but then we can start going and building that future. No.

55:23

Not yet. Come on. Come on. Yeah. There might be one. That was quick. That

55:26

was a quick... That took all of 43 seconds. It's always somebody been

55:37

thinking about this back in 2015. Oh. Like I said, in five to seven

55:42

years when some of the vendors start talking about this, you know where

55:45

you heard it first. All right. One thing we would love to do

55:49

on the show is go around the horn and share a last call.

55:51

Something that might be of interest to our audience. Matt, you're our guest.

55:54

Do you have a last call you'd like to share? Sure.

55:56

Actually, is it okay if... I have a couple. Yeah. Go for it.

56:01

One is, since we were talking about this, and I just wanna be clear that I am sort of adjacent to

56:07

it. I'm not an expert in the privacy engineering space, but there are

56:11

experts there. It's just amazing community and I highly recommend anyone

56:16

who's interested in any of this to attend PEPR, which is the Privacy

56:19

Engineering Practice and Respect conference. It just happened last month

56:24

and it's coming up next year. But I highly recommend folks,

56:27

and I can give you all a link if you wanna put that

56:30

on the page for the podcast. Really some of the most inclusive...

56:35

Which actually that's, is it through, that's for your stake years. So we're

56:39

gonna... We'll link to the talk you did there is available on YouTube,

56:43

right? Yep. It's that conference and really, it's some of the smartest people

56:47

you've ever met and also some of the warmest and most inclusive community.

56:54

It's very Star Trek rather than Star Wars vibe. So it's great and

57:02

then kinda more literary but sort of think, we talked a little bit

57:06

about cardinality and sort of ideas of information and whatnot is kind of

57:11

the... I recommend the short stories of Borges, I'm not sure, but Argentinian

57:17

writer, The Garden Of Forking Paths and the Library Of Babel, those are

57:23

two of his short stories. And I think if you wanna be like

57:28

in the know data scientists, like sort of a literary data scientist, those

57:32

are two good short stories to have read. And then once you start

57:35

reading those, you'll get hooked. So that's my last call. Wait. I assume

57:40

it will make it through the editing, but I was introduced to the

57:43

Library Of Babel by Joe Sutherland as we were working on this book.

57:46

So we have a whole... It's actually in the book that we're working

57:49

on as a explanation and illustration of the Library of Babel.

57:53

So I should actually read the short story I guess, instead of just

57:57

the Wikipedia entry. Oh, no, it's great. Yeah, you should read both. And

58:00

definitely Garden of Forking Paths, which is often referenced in

58:06

research design, which is, people refer to that when talking about researcher

58:10

degrees of freedom and reproducibility of studies and whatnot.

58:16

So there's a lot of the ideas that are adjacent to what we

58:20

work on are embedded in these great short stories. Very nice.

58:25

All right. What about you, Julie? What's your last call? My last call

58:30

is actually inspired by a previous show not long ago with Katie Bauer. I

58:37

was looking through some of her different articles and I came across one

58:40

that was titled Deciding If A Data Leadership Role Is Something You Actually

58:44

Want To Do. It was an interesting read overall, if that's like a

58:47

point in your career that you're at, but I just felt like she

58:52

broke it into a lot of helpful ways that she thought about making

58:56

a decision about what next role she wanted.

59:00

And she talked a lot about, titles in ways she thinks about your

59:04

titles, which I think a lot of people run into that at different

59:06

points in their career. So I thought that was just a great way of

59:10

framing it. She then listed a bunch of great questions that she actually

59:13

used when going through interviews for different roles and I kind of started

59:18

to think about how I feel like they would be super helpful,

59:22

even me as a consultant thinking about asking my stakeholder or can I

59:26

ask or can I figure out the answer to these types of questions

59:29

with like where my stakeholder sits in their org, what is their actual

59:33

job, what is their role compared to their peers? What is their manager

59:37

like, who are they working with? What are their relationships like?

59:40

And she just outlined a lot of different great scenarios of how data

59:43

teams fit within organizations. And so whether you're using those questions

59:47

to ask when you are interviewing for new roles or like I said,

59:50

I'm kind of inspired to use them in different scenarios. I thought it

59:54

was a great read. Excellent. All right, Tim, what about you?

59:58

So I feel like I'm gonna be pulling some of these is we've

1:00:01

turned in the initial full draft manuscript for the book, which means I've

1:00:05

learned a few things that I'd either forgotten or were new things coming

1:00:10

out of the brain of Joe Sutherland. And

1:00:13

one of them is, it's an oldie but a goodie. It's kind of

1:00:17

an academic paper published on the National Library of Medicine at the NIH

1:00:23

and the paper is titled, Parachute Use to Prevent Death and Major Trauma

1:00:28

Related to Gravitational Challenge, Systematic Review of Randomized Controlled

1:00:32

Trials. So it's from 2003 and it is a brief academic paper where

1:00:39

these two people who basically kind of dared each other, the notes at

1:00:42

the end kind of explain, hint at what happened. But basically they were

1:00:46

looking, saying if scientific evidence really requires a randomized controlled

1:00:51

trial for high stakes things, then surely we should just go into a

1:00:55

survey of all the randomized controlled trials around the efficacy of parachutes.

1:01:00

And the result... They had a whole plan on how they were gonna

1:01:03

find the outcomes and their meta analysis and what they were gonna do.

1:01:06

And the results are that our search strategy did not find any randomized

1:01:09

controlled trials of the parachute. So it's kind of a little bit of

1:01:13

poking fun at the scientific community, but in a kind of a delightful

1:01:18

way with some pretty funny footnotes. And it actually did get kind of

1:01:25

published in a way. So it's just kind of a good reminder of

1:01:29

being clear on the question you're trying to answer and what

1:01:32

your options are for answering it. So that's random.

1:01:37

What about you, Michael? What's your last call? Well, it's interesting.

1:01:41

I had a conversation recently with my niece who's getting ready to start

1:01:44

the school year and she's taking an AP statistics class, which I didn't

1:01:49

even know that kind of class existed in high school.

1:01:51

But we started talking about some of the pre work that she got

1:01:54

assigned and I realized I was like starting to explain some foundational

1:01:59

statistics concepts, that she was kind of like struggling with. And it reminded

1:02:03

me of this book I read early in my career called The Cartoon

1:02:06

Guide to Statistics. 'Cause whenever I go back to sort of those first

1:02:10

things, I'm always reminded of that book, which I got recommended to me

1:02:14

actually by Avinash Kaushik way back in the day. So that's my last

1:02:18

call. I think I may have done it before, but it's been many,

1:02:21

many years. And that conversation sort of brought it back up.

1:02:24

So if you're getting into statistics or you just wanna have a better

1:02:28

foundation in statistics, that's actually a great book to have on your shelf

1:02:32

to pull off and read. And some of the stuff we talked about

1:02:35

today, I kept up with because I've read that book and it's a

1:02:40

cartoon so it's easy. So anyways, Cartoon Guide To Statistics. That's funny.

1:02:44

There you go. It's on my shelf and I never could make it

1:02:46

through it. I should. I should go back and read it now.

1:02:49

I feel like I was... Didn't... Yeah. I should try it again.

1:02:52

It probably would make more sense. Yeah. 'Cause you... What was funny was

1:02:57

how much I realized I'd actually learned over the years about statistics

1:03:01

in just trying to explain a couple things.

1:03:04

And I realized like, wow, I actually know a couple of things about

1:03:07

statistics now, which I think that's important I should know. But it's...

1:03:11

And I think, if we're being honest, all due to the Conductrics quiz.

1:03:15

Oh yeah. Absolutely. Absolutely. Full circle. It's a full circle moment.

1:03:21

A 100%. Well, yeah, this has been obviously such a great conversation and

1:03:25

I know as you're listening, you may have questions, you may have input,

1:03:29

there's things you might wanna share that we would love to hear from

1:03:32

you. And the best way to do that is through the Measure Slack

1:03:35

Chat community, or as much as... We're on LinkedIn as well.

1:03:40

And also you could email us at contact@analyticshour.io and I think,

1:03:46

Matt, you're pretty active on that community as well as on the TLC.

1:03:50

Yeah. Highly recommend folks sign up for the Test and Learn community Run

1:03:55

by Kelly Worthham. That's a great space to learn about all things experimentation

1:04:01

in an inclusive space. Yeah, absolutely. And we heartily recommend it as

1:04:07

well. And it's a great place to explore these ideas and keep this

1:04:11

conversation going as well. So love to hear from you and

1:04:17

keep learning more about privacy engineering, privacy by design, K anonymization,

1:04:22

differential privacy, I mean all new and amazing concepts for me today.

1:04:27

So awesome. All right. And of course, no show would be complete without

1:04:32

a huge thank you to Josh Crowhurst, our producer for all you do

1:04:36

behind the scenes to make this show happen. We thank you very much,

1:04:39

sir. And of course, thank you Matt so much for coming back on

1:04:44

the show. It's always a pleasure. Makes me reminisce about all the awesome

1:04:48

times we've had at SUPERWEEK and other places. It's always a delight to

1:04:52

hang out and talk. Thank you so much for having me.

1:04:55

I really appreciate you all welcoming me back and it was great to

1:04:58

meet you, Julie. Yeah, you too. Awesome. And I think I speak for

1:05:04

a random assortment of co hosts that I may have,

1:05:08

that I've incremented a couple of times when I say, no matter how

1:05:12

you're trying to drive forward with privacy, remember,

1:05:15

keep analyzing. Thanks for listening. Let's keep the conversation going

1:05:21

with your comments, suggestions, and questions on Twitter at @AnalyticsHour,

1:05:26

on the web, at analyticshour.io, our LinkedIn group and the Measured Chat

1:05:32

Slack group. Music for the podcast by Josh Crowhurst. So smart guys want

1:05:38

to fit in, so they made up a term called analytics.

1:05:41

Analytics don't work. Do the analytics. Say go for it, no matter who's

1:05:45

going for it. So if you and I were on the field, the

1:05:48

analytics say go for it. It's the stupidest, laziest, lamest thing I've

1:05:53

ever heard for reasoning in competition. Text was like, Tim and Mo were

1:05:59

supposed to be cool, almost like secret agents and like just had their shit

1:06:03

together. And Michael was just kind of like, did you ever see, what's that

1:06:07

movie with Matt Damon and Alec Baldwin? And it's like all Boston and

1:06:13

Wahlberg. And there's that scene where Alec Baldwin is like the police commissioner

1:06:18

and he's all like frantic and he's sweating and he's just like, totally

1:06:21

discombobulated. That was how I thought of Michael, which just like totally

1:06:26

out of sorts, just... And, then Tim and Mo would just kind of come

1:06:31

in and just be like cool cucumbers and like, just have their shit together.

1:06:35

And Michael never played it correctly. And he edited it out.

1:06:39

He wouldn't say... Oh, but anyway. I sent... I had a dialogue for

1:06:47

him. No. That was the whole bit. Oh, man. But how did you really feel? But

1:06:56

Michael, I can't believe, like I thought he would just like lean into

1:06:59

it, but no, he was too embarrassed or he like didn't like,

1:07:02

he's like, his ego was too great to play. He just didn't commit. Yeah. He

1:07:06

just didn't wanna play it. I think, he just couldn't play it up.

1:07:08

He's like, I'm too serious for this. I'm not gonna be the one

1:07:11

who doesn't know what's going on. Well, you're not the one who's answering

1:07:13

the questions. That was the whole point. I didn't understand the vision.

1:07:17

But I just didn't understand the vision. I'm not cut out for high level

1:07:24

acting. Julie picked up on it. Julie picked up on it.

1:07:27

That was... No, Michael said that verbatim in one of the episodes.

1:07:31

He literally stopped midway into the quiz and he goes, why am I

1:07:34

always panicking? Why am I so frantic in this? That's the whole bit. That

1:07:37

was like the narrative theme. Mo and Tim were just like the 007s. Rock

Rate

Get this podcast via API

From The Podcast

The Analytics Power Hour

Attend any conference for any topic and you will hear people saying after that the best and most informative discussions happened in the bar after the show. Read any business magazine and you will find an article saying something along the lines of "Business Analytics is the hottest job category out there, and there is a significant lack of people, process and best practice." In this case the conference was eMetrics, the bar was….multiple, and the attendees were Michael Helbling, Tim Wilson and Jim Cain (Co-Host Emeritus). After a few pints and a few hours of discussion about the cutting edge of digital analytics, they realized they might have something to contribute back to the community. This podcast is one of those contributions. Each episode is a closed topic and an open forum - the goal is for listeners to enjoy listening to Michael, Tim, and Moe share their thoughts and experiences and hopefully take away something to try at work the next day. We hope you enjoy listening to the Digital Analytics Power Hour.

Join Podchaser to...

Rate podcasts and episodes
Follow podcasts and creators
Create podcast and episode lists
& much more

Episode Tags

Do you host or manage this podcast?
Claim and edit this page to your liking.

,

Unlock more with Podchaser Pro

Audience Insights

Contact Information

Demographics

Charts

Sponsor History

and More!

Pro Features

Resources
Help Center
Blog
API

Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More