Episode Transcript
Transcripts are displayed as originally observed. Some content, including advertisements may have changed.
Use Ctrl + F to search
0:05
Welcome to the Analytics Power Hour. Analytics topics covered conversationally
0:10
and sometimes with explicit language. Hi everyone, welcome. It's the Analytics
0:15
Power Hour and this is episode 253. Almost every time I've attended the
0:23
SUPERWEEK conference in Hungary over the past seven years, a major theme
0:28
is how much our industry is changing. And lately, especially with privacy
0:33
regulations, new laws that impact our industry. And the other thing I usually
0:38
take from that conference are new ideas about where the industry's heading
0:42
and how we adapt to these changes. And I think
0:45
this conversation on this show will be similar, I think, in a lot
0:49
of ways. And with new constraints on how, when, and where we collect
0:54
and store data, it's high time to embrace new paradigms where we can
0:59
find new ways of thinking about data collection and usage in a privacy
1:05
first world. So let me introduce my co hosts, Julie Hoyer.
1:09
Welcome. Hey there. That's awesome. And Tim Wilson, who has been with me
1:14
many times at SUPERWEEK. Welcome. Thought we weren't gonna talk about that.
1:18
It's on video. What happens at SUPERWEEK stays at SUPERWEEK? Stays at SUPERWEEK.
1:23
Well yeah, we won't talk about a lot about it. All right. And
1:29
I'm Michael Helbling. And our guest today needs no introduction, but let
1:33
me do a little bit. He is the CEO of Conductrics, an amazing
1:37
thinker and speaker, and is our guest again for the third time.
1:40
Welcome to the show, Matt Gershoff. Thanks for having me. A real honor.
1:45
It's awesome. I'm thankful to have you too. And actually, as I was
1:48
thinking about it, I was like, well, you've been there at most of
1:50
these SUPERWEEKs as well. And your company Conductrics sponsors that event.
1:54
And I remember that very fondly. I mean, seeing you there all those
1:59
years, it's also really fun. But one thing, Matt in this topic specifically,
2:06
you consistently in our industry, for me anyway, are usually one of the
2:11
people who sort of about five years ahead of what a lot of
2:16
people are talking about. And so I think it's really interesting that one
2:20
of the things you're really talking about now around sort of this world
2:24
of new privacy laws and things like that is about adopting a mindset
2:28
of just in time or just enough data or sort of privacy first
2:33
mindset around data. And so kind of maybe as a starting point,
2:37
what got that going for you when? And what kind of spurred that
2:41
as sort of a major area of thinking and writing for you over
2:45
the last couple of years? Sure. Well, first, thanks for having me.
2:48
And thanks for having me, Tim and Julie. This is
2:52
gonna be fun. Looking forward to it. Well, actually just to step back
2:56
a little bit, the work that we've been looking into
2:59
and working on within Conductrics around privacy engineering and data minimization
3:06
is really less about privacy per se, and really more about
3:12
thinking about why we're doing analytics and experimentation in the first
3:19
place. And so I think for us, we have a slightly different view
3:25
of the value of experimentation. And just so that the listener understands
3:30
where I'm coming from, is that Conductrics is in part an experimentation
3:35
platform where you might do A/B testing and multi armed bandits and that
3:40
type of thing, where you're trying to learn
3:42
basically the marginal efficacy of different possible treatments.
3:46
And for us, we really feel like the value of experimentation
3:54
is that it provides a principled procedure for organizations
4:00
to make decisions intentionally, to make them explicitly,
4:05
and to consider the trade offs between competing alternatives. And ultimately,
4:11
the reason for doing this is to act as advocates, sort of the
4:15
front line for the customer. And so we have a much more,
4:21
I guess, hospitality or omotenashi approach to why experimentation, why
4:29
one really should be doing experimentation. And I think that's true of analytics
4:33
more generally. It's like, really, why are we doing it?
4:37
And I think one of the issues that I've seen in the,
4:41
I don't know, almost 25, 30 years that I've been in the analytics space
4:45
is that sometimes analytics tends to become, kind of lose that focus.
4:53
And we tend to have programs that become
4:57
almost ritualized. So we sometimes start doing behaviors
5:02
just to do them, and we kind of lose the focus of
5:06
really why and what the ultimate objective is. And so for us,
5:11
part of the reason why privacy engineering and data minimization is something
5:18
that we've gravitated towards was, one, part of that is really about respect
5:22
and being customer focused. But also, two is that it really forces one
5:29
to think intentionally. And we ask the question,
5:33
what is sort of the marginal value of the next bit of data?
5:38
Like, why should we collect this next piece of data or the added
5:42
data? And to really have some sort of editorial and expertise about why
5:48
we might be getting more information about the user when we might not
5:53
really need it in the first place. And so this idea of intentionality
5:56
is really what underpins both experimentation for us as well as why we
6:03
were interested in moving towards having a more
6:09
data minimization approach to the experimentation platform. So you said
6:13
sort of the ritualized behavior, which you came up with,
6:18
as I recall, you sort of came up with two and then you added a third. You said, oh,
6:24
there are these mindsets of data, I wanna get data just in case,
6:31
just in case I need it. And that, I think, falls under that
6:33
kind of ritualized behavior, gather all the data,
6:39
not considering the incremental value of it. And you contrasted that with
6:43
just in time. And then you added like just enough, I think,
6:47
a little bit later. But does that fit that we're kind of making
6:53
a broad generalization in analytics? And I think even in experimentation,
6:58
there's a tendency to say that next bit of data,
7:03
the cost to collect it is near zero. So let me collect it
7:07
just in case down the road. And that just is kind of ballooned out
7:12
that you add on a million additional data points. And now you're just
7:16
in the habit of just collecting everything and sort of
7:20
lost the idea that you're actually trying to figure out what you're doing
7:24
with it. Yeah, that's a good question. That's a good comment.
7:28
Really what it is that if you think about it,
7:32
the GDPR and data privacy, most of that conversation has been around compliance.
7:38
Which, and what you can't do. And a lot of that is really
7:42
sort of a procedural thinking, like do you follow certain procedures for
7:46
risk mitigation? And really what I think the privacy legislation is really
7:52
about is to encourage privacy being embedded in technology, being embedded
7:59
in processes by default. It's not that you shouldn't collect
8:04
data if it's required. It's not that if you have a task
8:07
and you need the data in order to achieve the task,
8:11
no one's saying that one shouldn't collect that. It's really about asking
8:15
for a particular task, whether or not the data
8:19
is pertinent. And it's about being sort of respectful to users and not
8:22
collecting more than that's needed. Now that privacy by default
8:27
is in contrast to what I think a lot of the thinking had
8:32
been or currently is in sort of analytics and data science,
8:36
which is really a data maximalist approach, which is
8:41
collect everything by default. And again, as you say, the sort of the
8:45
marginal cost of the next level of granularity, right? So we can think
8:51
of more data as being finer and finer levels of granularity for any
8:56
particular data element, or it could be additional data elements
9:02
and it can also be additional linkage. And so that's sort of that
9:06
whole 360 and so that every element or event can be traced back
9:11
or associated with an individual. So you kind of have those three dimensions
9:16
to expansion of data. And so I was really trying to point out
9:21
is that a lot of that data collection
9:26
is somewhat mindless. It's just that just in case and underpinning it,
9:30
is it really an explicit objective, right? We're not, we don't have a
9:34
particular task and we're collecting data for this particular purpose. Like
9:40
in an experiment, I was talking about just in time is because we
9:44
have the task. I need to know whether the marginal efficacy of one
9:49
treatment over another, one experience over another. And so then I need
9:52
to go out and collect data for that task
9:56
versus just in case it's really, I don't really know what the question
10:00
is that I'm gonna ask, but I'm gonna collect it anyway.
10:04
Now, why am I gonna collect it? Well, really there's sort of a
10:07
shadow objective, which is one based upon magical thinking, which is
10:13
all of the value is in that next bit. It's almost like the
10:17
gambler who's at the table when they're losing and they just have to
10:22
believe that the next hand is where the big giant payoff is.
10:26
That often gets rationalized in data science and venture land is sort of
10:33
fat tails, right? And so there's some sort of huge, there's huge payoffs
10:37
out there lurking in the shadows and you just need to have reached some
10:41
sort of threshold of critical mass in order to achieve it.
10:46
And I'm not saying that that doesn't exist, but it's unlikely that it
10:50
exists in the probabilities that people think. So that's one side of things,
10:57
which is this magical thinking that all the value is in the data
11:00
that I haven't collected. And then secondly, it's about minimizing regret.
11:04
So it's like, well, I don't wanna not have collected it in case
11:08
I need it in the future. My boss asked for it.
11:11
And so we collect it. And that's sort of collection by default.
11:15
And that is not consistent with the privacy by default. And that's really
11:24
the law. And so that's not to say, though, that discovery
11:30
isn't something that's also important. So it's not about being paternalistic
11:34
and saying, don't collect data or there's a certain way that you have
11:39
to do it. Really, all we're talking about is just being thoughtful about
11:43
it and being intentional. So it's like, hey, I think perhaps that if
11:47
we had the company may think or you folks may think that,
11:50
hey, from this particular company or a client, if they had X data,
11:54
then they could solve tasks A, B, C, D, X and Z,
11:59
whatever. And that seems totally reasonable to me. Then you have a reason
12:05
to go collect that data and then check, Okay well, does it look
12:07
like this data is informing these decisions or helping us make decisions?
12:12
But that's entirely different than just collect everything.
12:16
And I think that just in case collect everything, one, it being mindless,
12:21
there is no objective to having it other than to have it, really
12:25
opens organizations open up to grift. The sales pitch, which is
12:31
can you afford not to collect it? A lot of that stuff.
12:35
And that's prevalent in our industry. And so I really think
12:39
it's really about being mindful. And it's really about
12:44
this idea that the real value is not in the data or in
12:48
any statistical method or any technology. It's really in the editorial
12:53
and the expertise and really the taste. It's like, does the company have
12:56
taste to be thinking about what is gonna be useful for their customers
13:01
and to be cognizant of what the customers need or have empathy for
13:05
them and to be using information about them in a way that's respectful?
13:09
That's really all, that underpins all of this.
13:16
It's time to step away from the show for a quick word about
13:19
Piwik PRO. Tim, tell us about it. Well, Piwik PRO has really exploded
13:24
in popularity and keeps adding new functionality. They sure have. They've
13:28
got an easy to use interface, a full set of features with capabilities
13:33
like custom reports, enhanced e commerce tracking and a customer data platform.
13:38
We love running Piwik PRO's free plan on the podcast website,
13:42
but they also have a paid plan that adds scale and some additional
13:45
features. Yeah. Head over to piwik.pro and check them out for yourself.
13:50
You can get started with their free plan. That's piwik.pro. And now let's
13:55
get back to the show. Well, it's funny, too, that
14:00
working with a lot of clients that do the just in case collection,
14:04
because, again, it is widespread. It's the norm across the industry,
14:06
I would say. I have run into so many situations where we go
14:11
and they ask a very important business question and we start with like that
14:14
question first and then they say, and we have all this data that
14:17
we can pull in and we have so much we should be able
14:19
to answer this. No problem. And time and time again, I start getting
14:23
into like the actual requirements of what the data needs to be able
14:26
to do to answer this great question. And then we find out that
14:29
even though just in case they've been collecting all of it,
14:31
it's not in the right structure or things can't be joined the right
14:35
way, whatever it is between the tool and the actual data structure itself,
14:39
we can't answer the question they care about. And so it would still
14:42
be then defining in that moment going forward, like, what do we actually
14:46
need to be collecting for you to answer this business question?
14:50
And it's funny because one of the examples I had was actually working
14:53
in Adobe Analytics, or actually Adobe CJA. And we were bringing in a
14:58
data set from, let's say, like Salesforce. And I started to have this
15:02
conversation with my stakeholders saying, you're asking great questions,
15:06
but you're asking questions that we're used to being able to ask the
15:10
data that would come in through Adobe that we were used to for
15:12
years with Adobe Analytics. And now you have this data coming in from
15:16
Salesforce, which was structured and designed to answer different types
15:20
of questions. And so they don't map perfectly together. And so now we're
15:24
starting to talk to them about how could we rework this and actually
15:28
bring in the data in a way to answer the questions you care
15:31
about and that your stakeholders coming to you actually need.
15:36
Yeah, the main thing is to be intentional. Now, but to be fair,
15:38
like some of those companies that you've mentioned in the past,
15:42
they were sort of masters of this collect everything
15:46
and magical stuff is gonna happen. And then all of the use cases
15:50
wound up being error handling because the site was broken. And so
15:56
that's not really a community that has been
16:01
totally innocent of maybe overselling collecting data. I mean, data is not
16:09
information. And I think it's important to think about
16:13
kind of like the entropy of what you've collected, like how compressible
16:19
is the data? And so a lot of times you have data,
16:23
but it's not information. It doesn't help you reduce uncertainty in a particular
16:31
question that you're asking. And that's what information does. And just
16:35
because there's bits being collected does not mean there's more information.
16:41
Well, and it feels like my concern is that it's already a problem.
16:46
It already is the, and you said it was kind of the laziness
16:49
of avoiding thinking of saying, well, just collect everything. I mean, the
16:52
number of times that I've got experiences where somebody said,
16:57
oh, the data collection requirements are pretty straightforward. Just collect
16:59
everything. And it's like, well, no that's lazy and simple for you to articulate.
17:04
It's actually showing that you're not thinking through what you're going
17:07
to do. I feel like we've been in that mode with lots of
17:12
forces sort of pushing that idea, that idea of I wanna have the
17:18
option to look at this data and hopefully it's structured well
17:23
with the, a chunk of the world of AI and
17:28
the next generation of the technology vendors jumping on that train or kind
17:33
of spinning the, well, to do AI, like the more data,
17:37
the better. And there, we're running out of data already to train the
17:41
models. And I'm afraid that's pouring kerosene on a raging,
17:48
poorly functioning fire already that now people get to wave their hands
17:52
and say, I'm doing this for the future of AI. It's just like
17:56
the next level of a lack of intentionality of
18:00
surely if I get even more data, then the AI will be able
18:05
to kind of run through it. But it's really just amplifying,
18:08
I think the same problem that you articulated when
18:13
very clear and concise questions may mean that you need to collect a
18:21
very small amount of data for the next
18:25
month, as opposed to you've got boatloads of data you've captured for the
18:30
last five years that actually aren't that helpful,
18:33
but you're gonna force yourself to go wade through that, trying to do
18:36
something that if instead you had intentionality and said, I'll just go
18:39
forward, like having that historical data, it actually makes it harder to
18:44
have the discussion of what's the best data to collect just enough of
18:51
just in time to answer that question. Oh, that's that new data.
18:55
And it's like, well, new data, what are you talking about?
18:58
We have this ocean of data. What can you do with that?
19:03
Well, what I can do with that is a much more complicated,
19:05
messier, actually less good at answering the question.
19:10
But yes, we're checking off the box that you can point to
19:14
your just in case mindset is having, helped me answer a question.
19:18
It actually wasn't the best way to answer the question in many cases.
19:22
Yeah, and I get so many times like, what can, just do what
19:25
you can do with the big messy historical data that we just in
19:28
case captured when I tell them like, oh, well to really answer this,
19:32
yeah, maybe it should be different data looking forward in a test.
19:36
And they're like, eh, yeah, well, we don't wanna do that. So what's the best you can give us from the other stuff?
19:40
Yeah, and just to be fair, I didn't use the word lazy.
19:44
I just think maybe just unaware. Yeah, I mean, I just think it's, I
19:50
think the value is in being aware and being explicit. That's what I
19:54
think data teams and companies should be doing.
19:58
And I think that's where the success is. And it's not in doing
20:02
analytics. It's analytics in the service of having
20:08
a well thought out understanding and model of
20:12
the customer and the environment that you're in. But this, again,
20:15
this isn't to be paternalistic and saying, I don't know, it's not for
20:19
me to say what companies in particular context should be doing or shouldn't
20:23
be doing. I just know for us, when we re architected the software
20:28
back in 2015, we were aware of GDPR, and we read up on
20:33
privacy by design, which are principles, I think came in mid '90s by
20:39
Dr. Ann Cavoukian, I believe. And there's seven main principles. And the
20:44
GDPR and other privacy frameworks have incorporated those principles into
20:52
their legal frameworks. And one of them is principle two, which is privacy
20:58
by default. And so, and I think principle three or four might actually
21:04
be by embedding. And this idea is that the software and systems
21:09
should have these, should be privacy by default, by design, and it shouldn't
21:13
be like a bolt on. And so customers should be able to use
21:16
the services by default in a privacy preserving way.
21:21
And it's really only in cases, you need to like move up from
21:26
the default as opposed to the current approach, which is collect everything
21:30
and moving down from that. It's really inverted and it really should be,
21:35
you should be collecting as little as possible to solve the task.
21:37
And we just realized that actually experimentation at least, and I'm not
21:42
saying everything, but at least in experimentation, many,
21:47
if not most, and actually most of the tasks in A/B testing experimentation
21:52
can be done following a data minimization principle, which means we really
21:58
do not need to link all the information together. We do not need
22:01
to collect IDs. And we can store data in what are known as
22:07
equivalence classes. You can kind of think of that as like a pivot
22:10
table. And so the data is stored at basically an aggregate level.
22:15
But even though the data is stored in an aggregate way,
22:19
which allows us to use ideas from privacy approaches such as K anonymization,
22:26
we can talk about that if that's of interest, we kind of use
22:29
ideas of K anonymity to help the client A, be able to audit
22:35
what data has actually been collected in a much more efficient way.
22:39
So it's very easy to know what you have and whether or not
22:41
it's in breach of any privacy guidelines you might have.
22:45
But also it means that we can do
22:48
the analysis in a much more computationally efficient way.
22:53
And so there's a lot of nice benefits from
22:57
following or embedding privacy by design principles into your systems and
23:03
procedures, which are beyond just having less data about the individual.
23:10
The main thing is that it encourages this idea of intentionality, just being
23:14
aware of what you're collecting and why. But that doesn't mean it's appropriate
23:18
in all cases. That's not what I'm saying here. It's just more of
23:22
an option. Well, and Matt, because I've now read and seen you talk
23:27
about this, like it kind of blew my mind a little bit when
23:31
it sort of clicked. And I think it was an indication of how
23:35
sort of stuck in the standard way of doing things was that when
23:39
it comes, if we just talk simple A/B testing on a website,
23:43
and we know that we need to know, let's just go with A
23:46
and B, that we've got, that you're treated with A, you poke around
23:50
on the website some more, you convert or you don't convert,
23:53
store row. Your B, you poke around on the website, you convert,
23:57
maybe you don't convert, and the amount. And it seemed like,
24:01
well, obviously, you have to have every one of those rows.
24:05
And then when you're done, you just kinda,
24:09
you pivot it and you compare the conversion rates and you gotta do
24:12
some other little t test kind of math.
24:15
And what kind of blew my mind is you were like,
24:20
well, wait a minute, what if instead you just incremented
24:24
counters? Because that step that I just glossed over of saying,
24:28
I've taken 10,000 rows of individual users and rolled them up so that
24:33
I could do the actual calculations that are done behind the scenes,
24:38
you were like, well, wait a minute, if what you need is a count, you
24:42
can just increment how many people got A, how many got B.
24:45
If you need the sum of how many converted, you don't have to
24:51
have all those rows, you can just increment a counter and say,
24:54
you're A, I need to track you in the session long enough to
24:58
increment the counter, I don't need to store a whole row,
25:00
I just need to increment a counter. And then where I really counted was
25:03
like, oh, and then if you need sum of squares, I can square
25:08
each value and then do the sum. 'Cause like, so, like you're literally
25:14
getting from, you have what was 10,000 rows and it winds up being
25:19
two rows that you're just incrementing. And that was kind of
25:23
your point saying, I can do, I can give you all the results
25:27
that you get from a standard A/B testing platform
25:31
in a standard basic A/B test. And that's just one scenario,
25:35
but I didn't gather even IDs. I just had to have in a
25:39
very limited temporal way until I could log
25:44
which class they went in and what the result was. And I can
25:48
just keep incrementing that. So one, did I state that fair?
25:52
Like that, if the listeners are like, what is he talking about anonymization.
25:56
Yeah, I don't wanna, yeah. So yeah, I don't wanna get in too much like, because this is like, this is gonna, I don't wanna lose
26:00
the listener here in too much minutiae here. But just to, but yes,
26:04
you're right. And so really, the realization was,
26:08
and what some of the listeners I'm sure are aware of,
26:11
but some may not be. To be fair, you headed down the K anonymization path
26:15
before I tried to do my summary. So
26:19
I don't want to be like, oh Tim, oh Tim, you're getting too detailed in
26:22
the weeds. No, we're getting, yeah, no, and really let's blame Julie because
26:26
we said beforehand that she was supposed to keep us from.
26:31
But just at a high level, it turns out that actually,
26:36
what underpins most of the analysis or an approach to mostly analysis of
26:43
the tasks that folks in experimentation need to do,
26:48
is really, is regression. It's like least squares. I'm not gonna go into
26:52
like, we don't have to go into like how it's done and all
26:55
that stuff. But it turns out that one is able
26:59
to do a regression analysis, do various regression analyses on data that
27:05
has been stored in equivalence classes in a certain way.
27:09
So the main takeaway is that we can store data in an aggregate
27:15
way such that we can do the same analysis as if we had the
27:20
data or most of the same types of analysis as if we had
27:25
the data at the individual level. And so
27:29
what are the types of tasks that we can do? Well,
27:32
as you said, we can do t tests which is sort of like
27:34
the basic frequentist approach for doing an experiment when we're kind of
27:37
trying to evaluate the treatment effect and try to account for the sampling
27:42
error. But also things like multivariate analysis and ANOVA analysis of
27:48
variants which you might do for multivariate tests. You might be doing something
27:53
like interaction checks. So maybe you have some sort of, like Conductrics
27:57
has some sort of alerting system where we're checking between different
28:01
A/B tests whether one A/B test might be interfering with another.
28:05
Underneath the hood, that's really for the folks who know some stats in
28:10
your listener base, it's really just doing like a nested partial F test
28:13
between two regression models, a full model and a reduced model.
28:17
All of those things can be done and even. I was gonna say
28:19
that, but I was trying to keep it up a little high level. It's just
28:21
more than T tests and even, there's like a lot of buzz and
28:27
I think exaggeration around things like Cupid, which is really
28:34
regression adjustment in the experimentation space. Even that
28:38
can be done on aggregate data. Now, the main point about it being
28:44
aggregated is really about data minimization, which is one, reducing the
28:49
cardinality of any data field, which is the number of unique elements that
28:53
we might wanna store. So instead of storing
28:58
the sales data, the pre sales data of the user
29:01
from some arbitrary precision of cents, maybe it makes sense to have it
29:06
in some sort of 10 bins that represent sort of the average value
29:11
of each bin. So from zero to 10, where the average value in
29:16
the 10 bin is like $1,000 or something. So the main idea is
29:21
to reduce sort of the fidelity and sort of down sample some of
29:26
the data that you're collecting so that you have less unique elements
29:31
within each data field and to collect fewer data elements and maybe to
29:36
decide when you wanna co collect elements. So
29:40
one can collect the data such that, let's say there's 10
29:44
segments, types of segment data that we might wanna collect within the experiment.
29:49
We can store those as 10 separate tables so that you can do
29:53
10 separate analyses or you can have them stored, you can collect them,
29:58
co collect them. Maybe we wanna have these two or three collected at
30:01
the same time or maybe up to 10. As you add,
30:05
you co collect data, you increase the joint cardinality, the number of unique
30:11
combinations and that's the thing that you kind of wanna manage.
30:15
It's like how many unique combinations of segment information do we wanna
30:21
collect? And the measure that we might wanna use is the number of
30:27
users that kind of fall within each one of those groups,
30:30
each of those combinations. And maybe we wanna have at least 10 users
30:35
that fall into each one of those combinations such that we're never really
30:40
collecting data on any individual user, we're collecting data on collections
30:45
of users who look exactly the same. And so that's really that idea
30:50
of K anon is how many other people look exactly the same in
30:56
the data set. And so you might wanna have some sort of lower
31:00
bound on that, say five or 10. And that's a good way to
31:03
measure, it doesn't provide privacy guarantees, but at least it's a good
31:08
measure to be aware of how specific or the resolution of the data
31:16
you're collecting about each individual. I like what you're saying. I think
31:21
one of the challenges that I'm thinking of right now and maybe it's
31:25
just dumb, but I feel like a lot of organizations lack
31:30
the underlying knowledge to start making those groupings or buckets
31:35
in the first place. And then sort of my question is sort of
31:39
then how do they get that level of information or knowledge to be
31:43
able to take that next step? Or is it they feel emotionally like
31:48
they're making the buckets, they're like, but buckets are less precise.
31:51
I need to be more precise. And that's just the right, that's the. I feel
31:54
like, that's going back to the first thing, which is sort of like
31:56
our nature is to just try to glom on to every piece of
31:59
information possible. But like there's just people with a lack of knowledge.
32:03
So let's say somebody said, hey, I'm gonna fight my instincts
32:06
to try to do this privacy by design. And now what I need
32:10
to do is I need to group users like the way you just
32:12
described to do K anonymization. How do I know how to set those
32:16
up so that they're gonna be realistic? Well, how do you know the
32:21
data you collect? I mean, first of all, you're making the decision at
32:24
a certain level of granularity anyway, like that's implicitly being done.
32:28
Secondly, again, I just wanna step back. This isn't the main, the main
32:33
takeaway here really is about just at least being thoughtful about it.
32:37
It may be that you don't change your behaviors at all.
32:39
Maybe totally fine. And in the whatever context someone is working in,
32:44
it may be appropriate. One use case is like, let's say you're in
32:48
a financial organization or healthcare where there is, you're in a regulated
32:56
industry or you want to have some sort of,
33:02
you have to collect the data anyway, let's say
33:05
that is private data, but you wanna do analysis.
33:09
There's this idea of sort of global and local
33:12
privacy that really comes from differential privacy. A global privacy is
33:17
where you have a trusted curator, right? And so
33:24
you have the data. Think, a good example of this would be the
33:26
US government and the census. So the data that's collected by the census
33:31
is extremely private information about citizens. And when that data is released,
33:38
it needs to be released in such a way that private information about
33:41
any individual is not leaked. And so in that case, the trusted curator
33:48
is the census bureau, but they have a mandate to release information for
33:53
the public. And so you could be in a situation where
33:57
you're an organization that has this information and you wanna do analysis.
34:01
So you might wanna release data to your analyst team
34:06
of the private data that has been privatized in some way.
34:10
And so one would be to use data minimization and this sort of
34:14
idea of K anon. But there's other approaches. There's differential privacy.
34:18
And so that's something I know, I just spoke at the PEPR Conference,
34:22
which is a privacy engineering and respect conference. And like there's
34:26
Meta is there and Google is there and whatnot. And they often have
34:30
situations where they collect data and they wanna do,
34:33
you build tools or analytics on it. But they release internally data that
34:37
has either been subject to differential privacy or various data minimization
34:42
principles. So that's one of these. Can you define, can you, how easy
34:46
is it to give a high level explanation of what differential privacy is and how it works. Well, I'm not an expert on
34:52
it and it's not super easy. But at the high level,
34:57
as far as I understand it, it's essentially,
35:01
I believe it's the one approach that actually provides privacy guarantees.
35:07
So you actually have a particular privacy guarantee around it. And the main
35:11
idea is that you inject a certain known amount of noise into the
35:19
data. So the data is perturbed by a certain quantity of noise,
35:25
which is defined by a, what's known as a privacy budget.
35:30
So basically you inject noise. It's usually either Laplacian noise or Gaussian
35:35
noise into the data set such that when a query comes back,
35:41
it's a noisy result. And so it essentially has certain guarantees that
35:48
any individual, you have a difficult time differentiating between
35:52
two data sets, one that has an individual in it, particular individual on
35:56
it, and an adjacent data set that's the same, except it does not
36:00
have that individual in it. And whether or not the query results are
36:05
consistent with or without that individual. And so
36:09
that is probably terribly unclear to the listener, but the main idea is
36:13
that you inject noise, you inject noise into the data set.
36:17
It's actually quite complicated. And at first it looks like amazing.
36:20
We took a look at it and we were thinking about doing it.
36:22
And I believe the census now is using differential privacy and it is
36:28
useful in a situation where you need to release a lump of data.
36:34
You need to release one particular query, like the census and they release
36:41
the results and they've applied a differential privacy mechanism to it.
36:50
It gets a lot more complicated when there's a lot of ongoing queries
36:53
on the data because there's a privacy budget and there's this idea of
36:56
composition, simple composition, advanced composition. It's somewhat related,
37:01
actually it's deeply related to Pearson Neyman hypothesis testing actually.
37:05
And so these ideas about inflation of type one error rates and all
37:09
that stuff is not completely dissimilar to the idea of consuming privacy
37:14
budget and whatnot. And so it's not clear to me how one would
37:17
actually manage it in an organization and two, whether or not organizations
37:22
would accept noisy data. People kind of freak out about that.
37:25
But there is this trade off of course, between privacy and
37:29
utility. But again, the interesting bit, I think the takeaway is one,
37:36
privacy by default is the law, at least in Europe and to various
37:41
degrees in different states. And what I found
37:46
can be often frustrating is that most of the privacy conversation is around,
37:51
again procedure and compliance. It's like you can't do this. And it's like
37:57
not productive. It's like, well, what, like help, give me some tools to
38:01
think about what we actually can do. Like if you care about outcomes.
38:06
And what is, I think of interest for the listener might be is
38:10
to look into privacy engineering, which is really
38:14
more a community and approaches about design based thinking to build systems
38:19
that have properties, privacy properties in them. And that gives a way forward
38:26
to actually build stuff and to build stuff that has these privacy properties
38:32
as part of them, as opposed to what I feel a lot of
38:36
the privacy conversation is about not doing stuff and people trying to like
38:41
block you from doing anything, very sort of bureaucratic in its approach,
38:45
very legalistic. And this is a much more engineering approach and really.
38:49
This whole conversation that we're having is really just about providing
38:54
an example of a company that has applied these privacy engineering principles
39:01
to their software. Now it's really gonna be up to everybody else to
39:04
decide when and where it's appropriate for them, but it is a way
39:09
to actually build stuff as opposed to just
39:13
not being able to do anything. So it's interesting, I never read the
39:18
seven principle, the Privacy by Design seven Principles until
39:22
prepping for this episode. And you, because you bring up principle number
39:25
two a lot, but principle number seven is the respect for user privacy
39:29
and keeping the interest of the individual uppermost. And I feel like that
39:34
may be a cudgel that I start swinging around like I... Watching on
39:40
LinkedIn, is people are posting these diatribes. If you're not taking your
39:44
first party data and pumping it into this other system and giving it
39:48
to that, what are you doing? This is insane. And it's,
39:53
you quickly watch the comment thread. Some people say, yeah, I use my
39:57
tool to do that. You have other people arguing about the logistical complexity
40:02
of doing it. And then there's like a tiny little thread that is
40:05
saying, is that in the individual's best interest? Like everything about
40:11
that. Sometimes it is. I think you were using an example earlier that
40:15
if you need data from somebody in order to provide them something that
40:18
they want, it is in their interest to provide it. But that feels like another whole
40:24
tranche of the MarTech industrial complex that... There is nothing about
40:30
that principle number seven of keeping the interest of the individual uppermost,
40:37
which I think is another piece of that, that maybe just a little
40:41
another hobby horse I can mount and gallop around on. Yeah.
40:45
Well, seven and two I bring up mostly because it's privacy as the
40:50
default. That's key. I think that's the key bit is that it should
40:54
be the default. And I definitely think, one should not be getting their
41:02
guidance from the marketing tech industrial complex. Like that's a problem
41:09
because there's perverse incentives there. That industry is incentivized
41:14
to push, collect everything and magical thinking like people will sell a
41:20
magic box if people wanna buy a magic box. And I think that's
41:25
the antithesis, I think of being thoughtful and mindful about why you're
41:30
doing something. Unless the optics of buying a magic box have value,
41:34
that's okay. I don't... It's not for me to judge like, what is... Why
41:40
you're doing something? It's just one should have thought about why they're
41:43
doing something. But it feels like this way of thinking will end up
41:46
being more productive for people long term though. Because we are,
41:51
to your point, going to continue to run into
41:55
restrictions privacy wise. And I think people that are still holding onto
41:59
this idea that I have all this historical data and if I can
42:03
just look backwards and answer any question and understand each individual
42:06
and watch their entire path through my website, I'll be able to answer
42:09
any question, I need to make any decision about the business.
42:13
But it feels like if someone could let go of some of that
42:16
baggage of the way the industry and the story's always been told to
42:19
us. That you can start by saying like, what is the best question
42:24
to answer right now for the business to make a decision moving forward?
42:27
And what's a way to actually ask that and answer it looking forward
42:30
again by doing experimentation rather than trying to do a very complex historical
42:35
analysis. And then you can go about actually designing and engineering the
42:39
data again, moving forward. And I run into this so much with my
42:43
clients where I do feel like you just get stuck in the cycle
42:46
of looking backwards. That it is refreshing to hear that this is
42:51
tactical steps and way of selling that forward thinking mindset instead.
42:57
And seeing that it could be really freeing for probably a lot of
43:02
companies. I don't think it has to be experiments. I think you could
43:06
even have stuff that if you're not tracking something and they're like,
43:09
well what's going on here? It's like, well, we could just keep a
43:12
counter, we're at our a physical store and somebody saying, well, we wanna
43:16
know how many people are looking at... How many people look at produce
43:20
versus toilet paper. And one option would say, well we gotta have cameras
43:25
mounted. So we've tracked all of that so we can answer it just
43:28
in case if you ask that question. Or if all of a sudden
43:32
that becomes a very important question to answer, say,
43:36
cool, we're gonna take all that money. We didn't invest in this super
43:39
complicated tracking system that had to store everything and we're just
43:43
gonna, send some resources. It's gonna take me two weeks to answer the
43:47
question, but very, very precisely. 'Cause I know exactly
43:51
what you're looking at and it may not be even an experiment.
43:55
It does seem like a... It is such a radical
43:59
shift, like a change in... I'm not optimistic that we're gonna be able
44:03
to affect that sort of a shift because there are a lot of
44:08
pressures that don't want it. And it's to Matt, I think your point,
44:13
it's so easy to get sucked into the compliance mindset for privacy.
44:18
Well, what do I, my default is everything, what do I have to
44:22
turn off or what layers do I have to put on
44:25
so that I'm backsliding at a slower rate from what I'm used to
44:29
doing as opposed to or... And you hit it quickly this, the simplicity
44:36
of the computation. Well there's a simplicity of if you have no data
44:41
and you have a really clear question and you say, what's the minimal
44:44
data I need to collect to answer that question? That in many cases
44:48
becomes a lot simpler for a lot of the questions. Now the problem
44:53
is, you're leaving a few questions that you could have answered otherwise,
44:56
I guess. And this isn't, and just to be clear you're not tied
44:58
to the old way they were collecting it. So many times you ask
45:02
a good question and the data they have in that topic is not
45:05
in a way you can even use it. So I love though that this frees you up to say, how exactly do I need the data
45:10
to answer the question instead of, again, you're married to the baggage
45:14
of what's already been done. And they're like, well, I spent a lot
45:16
of time and money and effort. So you gotta figure out how to
45:19
use it. Also... That's a great point. And also, just to be clear,
45:23
this isn't like Gershoff's point, this isn't like me, this is like,
45:29
it's encoded in the law. That's what... It's Gershoff's law. No. Yeah. It
45:33
has nothing to... It is now. 100% It's not like I'm bringing this
45:37
to the table. It's like that's privacy by design is embedded in things
45:43
like GDPR, article 25 in principle five, 5C I think. So it's not
45:49
like I am suggesting that people do this special thing.
45:54
It's really, this is what's out there. This is part of the expected
45:59
behavior, especially at least in Europe, I guess. And what are some ways
46:05
that we might wanna think about it and, oh yeah,
46:08
also it is, I think supports this idea,
46:12
which I think is really the main point from my perspective.
46:16
Is that the value of... The value is not in this technology.
46:20
It's not in our software or other company software.
46:24
It's not in any statistical method or in the analytics method.
46:28
It's really about being thoughtful about what it is you're trying to do
46:33
and being thoughtful about what the customer might care about and being
46:38
explicit about how you're allocating resources and then thinking about things
46:42
at the margin. And a nice added benefit of thinking about datamisation in
46:48
privacy engineering is that it is consistent with thinking that way.
46:54
That's really the main thing. I think that's what's nice about it is
46:58
that it helps us think through and be,
47:02
have clarity about why we're doing stuff. What you wind up doing
47:07
is not for me or any of us to say it's really gonna
47:10
be ultimately for everyone in whatever context they're in.
47:14
That's all. It's really just calling that out that
47:17
we can actually have sort of outcomes. One of the... It's not gonna
47:22
be my last call, but it's Jennifer Pahlka who wrote Recoding America.
47:30
There's a really good podcast with her on Ezra Klein his podcast.
47:36
And I think she has great clarity on where she talks about
47:41
procedural thinkers and outcome based thinkers. And I think that's a really...
47:47
She kind of frames it in a way that I think about all
47:51
the time and a lot of privacy conversation is really procedural.
47:54
It's like, have you followed this process? Have we have we hit the
47:59
check marks? Yeah. Great. But it's sort of like, it doesn't tell you
48:03
how to do anything. It doesn't tell you about how to improve your
48:07
outcomes, whereas the privacy engineering side of things is really outcomes
48:10
based. It's like, how do we actually do stuff? And I think
48:14
the one thing that is the theme that runs through analytics and marketing
48:20
analytics specifically is about outcomes. We really should be caring about
48:24
outcomes and actually being productive. You can say that it's not you saying
48:30
this, but as you're saying that, I think you're pointing it out,
48:34
but if you look at all of the hand ringing around
48:40
GDPR and different kind of privacy legislation in Europe, and then they're,
48:46
oh, these countries are saying that their interpretation is Google analytics
48:50
is not valid. As soon as that sort of becomes
48:55
the debate, it becomes the regulators don't understand
49:00
digital and that's not reasonable. And let us rationalise why
49:06
the way that we're doing things is fine.
49:10
So that then, that just sucks all the oxygen out of the conversation
49:14
is what's the ruling gonna be as to whether this platform is allowed
49:19
in this region based on this argument. And it feels like it just
49:26
by default moves four steps away from the underlying
49:30
intent and the principle and then has a debate kind of in the
49:34
wrong space. Where you're pointing out that like, no, no, no,
49:38
where it started is valid and let's not rip it away from there
49:44
and go have an argument somewhere else that's already missed the point.
49:48
Yeah. And you don't have to be part of that argument.
49:50
That's like... You don't... That's a decision that you make. Like is that
49:58
what you care about? It's not what I care about. And so
50:02
we just wanna make good product and that's respectful of our users and
50:07
is consistent with some of these principles. And it has some nice benefits
50:12
and we're just, I'm chatting with you all right now is really like
50:16
here A is an example, and then also B. Again,
50:21
making sure we just don't just mindlessly collect data. Now there's a reason
50:26
to push back on that is that privacy or data minimisation is the
50:31
default. And so you make that what you will. It's really gonna be
50:36
up to everyone else, but I think it's valid just to sort of
50:40
point it out. But yeah, there's a lot of nonsense out there,
50:43
Tim. So what? There's a lot... There's... I mean if you're getting your
50:51
information from LinkedIn primarily what's LinkedIn? It's like a lot of
50:55
people like self promoting their stuff and people like, are they really
50:58
experts? You look at it, a lot of people aren't
51:01
and there's a lot of nonsense multipliers. There's a lot of agencies out
51:06
there. People just, you gotta step back and think about what the perverse
51:10
incentives are and there's a lot of perverse incentives out there and
51:15
a lot of folks are selling product and are selling services.
51:20
And what is new often is something that they can use to sell.
51:24
And I just think by being, again, I don't overuse the word intentional,
51:29
but just being thoughtful and mindful is a
51:34
protection against acting in a way that isn't rational and you can
51:40
bump up what they're saying to see if it's sort of consistent
51:43
with what your actual needs are. And again, I sell software and so
51:47
people can be... I have my biases as well and so
51:54
I'm well aware of that. But again, this is stuff that is not made up by us, by me.
52:02
It's kind of the law and just a way of thinking about it.
52:06
But again, we're not selling, there's no one way to do things and
52:10
we're not being paternalistic about it. It's not for me to say or
52:13
any of us to say how others should... Well you all are some of you're consultants.
52:17
So I guess it is kind of for you to give guidance.
52:20
But it's ultimately... The way we look at it, it's our job to
52:25
give... It's almost like being a doctor and there's various treatments and
52:30
we may have a preference about what we think a type of treatment
52:33
works, but it's ultimately up to the client to think through what are
52:39
the trade offs between different interventions? And does one approach
52:45
work better for them? They are in a better position to know.
52:48
It's just really our job to give them options and ultimately if they
52:54
do something they wanna do an approach that isn't what we would've done,
52:58
that's totally fine. It's not for us to say. It's just our job
53:02
to give, to be acting in good faith and kind of give them
53:05
options. I love that we've got this conversation done now 'cause I think
53:10
we're gonna be referring to it again and again and again over the
53:14
next many years. This is good on a lot of levels
53:19
for a couple reasons. One, because when we start seeing vendors in five
53:23
years talking about this, we'll know where it came from.
53:28
And as we sort of seek out and pursue sort of almost like
53:34
a new set of first principles as analysts around how incorporating privacy
53:39
in a proactive manner works. It's starting at this sort of juncture.
53:44
It's a lot of food for thought. All right. This has been outstanding
53:49
as per usual and thank you Matt. Thank you very much.
53:55
Well thank you so much for having me. It's been a real pleasure. It's good.
54:00
I've got a lot of thoughts going on as I usually do when
54:03
we talk and none of them are very well formed and most of
54:07
them probably don't make any sense. So it's gonna take a while. But
54:10
this is really good and I think I echo what you were saying,
54:13
Julie, which is sort of like, this is the first time I've sort of looked
54:17
at privacy stuff and not felt sort of like this,
54:20
oh, they're just crushing our fun and we have to follow all these
54:23
rules. There's now sort of like, okay, there's a path forward and I
54:26
can get excited about that. Now I'm intrigued and I wanna go learn
54:31
more about how do I incorporate that as part of a central part
54:34
of my path out from here. Which I think is. Yeah. Can I
54:38
just say, I do, to echo that, Michael, I started to feel at
54:44
the very end I was starting to culminate all my thoughts finally into
54:47
something coherent of, I really like that this way of thinking gets rid
54:52
of the fear of feeling like they're losing something with the privacy
54:57
laws out there and the new regulations coming. Because I feel like that's
54:59
what always the conversation is about is we're losing this, we're losing
55:04
that, oh no, you wanna hold on tighter because you feel like things
55:07
are being pulled away from you. But this kind of breaks that fear
55:10
cycle and, yeah, it feels kind of like a new day.
55:13
Like, oh, turn the page. There's a new way to start.
55:16
You can start fresh, it's okay. None of our tools support it yet,
55:20
but then we can start going and building that future. No.
55:23
Not yet. Come on. Come on. Yeah. There might be one. That was quick. That
55:26
was a quick... That took all of 43 seconds. It's always somebody been
55:37
thinking about this back in 2015. Oh. Like I said, in five to seven
55:42
years when some of the vendors start talking about this, you know where
55:45
you heard it first. All right. One thing we would love to do
55:49
on the show is go around the horn and share a last call.
55:51
Something that might be of interest to our audience. Matt, you're our guest.
55:54
Do you have a last call you'd like to share? Sure.
55:56
Actually, is it okay if... I have a couple. Yeah. Go for it.
56:01
One is, since we were talking about this, and I just wanna be clear that I am sort of adjacent to
56:07
it. I'm not an expert in the privacy engineering space, but there are
56:11
experts there. It's just amazing community and I highly recommend anyone
56:16
who's interested in any of this to attend PEPR, which is the Privacy
56:19
Engineering Practice and Respect conference. It just happened last month
56:24
and it's coming up next year. But I highly recommend folks,
56:27
and I can give you all a link if you wanna put that
56:30
on the page for the podcast. Really some of the most inclusive...
56:35
Which actually that's, is it through, that's for your stake years. So we're
56:39
gonna... We'll link to the talk you did there is available on YouTube,
56:43
right? Yep. It's that conference and really, it's some of the smartest people
56:47
you've ever met and also some of the warmest and most inclusive community.
56:54
It's very Star Trek rather than Star Wars vibe. So it's great and
57:02
then kinda more literary but sort of think, we talked a little bit
57:06
about cardinality and sort of ideas of information and whatnot is kind of
57:11
the... I recommend the short stories of Borges, I'm not sure, but Argentinian
57:17
writer, The Garden Of Forking Paths and the Library Of Babel, those are
57:23
two of his short stories. And I think if you wanna be like
57:28
in the know data scientists, like sort of a literary data scientist, those
57:32
are two good short stories to have read. And then once you start
57:35
reading those, you'll get hooked. So that's my last call. Wait. I assume
57:40
it will make it through the editing, but I was introduced to the
57:43
Library Of Babel by Joe Sutherland as we were working on this book.
57:46
So we have a whole... It's actually in the book that we're working
57:49
on as a explanation and illustration of the Library of Babel.
57:53
So I should actually read the short story I guess, instead of just
57:57
the Wikipedia entry. Oh, no, it's great. Yeah, you should read both. And
58:00
definitely Garden of Forking Paths, which is often referenced in
58:06
research design, which is, people refer to that when talking about researcher
58:10
degrees of freedom and reproducibility of studies and whatnot.
58:16
So there's a lot of the ideas that are adjacent to what we
58:20
work on are embedded in these great short stories. Very nice.
58:25
All right. What about you, Julie? What's your last call? My last call
58:30
is actually inspired by a previous show not long ago with Katie Bauer. I
58:37
was looking through some of her different articles and I came across one
58:40
that was titled Deciding If A Data Leadership Role Is Something You Actually
58:44
Want To Do. It was an interesting read overall, if that's like a
58:47
point in your career that you're at, but I just felt like she
58:52
broke it into a lot of helpful ways that she thought about making
58:56
a decision about what next role she wanted.
59:00
And she talked a lot about, titles in ways she thinks about your
59:04
titles, which I think a lot of people run into that at different
59:06
points in their career. So I thought that was just a great way of
59:10
framing it. She then listed a bunch of great questions that she actually
59:13
used when going through interviews for different roles and I kind of started
59:18
to think about how I feel like they would be super helpful,
59:22
even me as a consultant thinking about asking my stakeholder or can I
59:26
ask or can I figure out the answer to these types of questions
59:29
with like where my stakeholder sits in their org, what is their actual
59:33
job, what is their role compared to their peers? What is their manager
59:37
like, who are they working with? What are their relationships like?
59:40
And she just outlined a lot of different great scenarios of how data
59:43
teams fit within organizations. And so whether you're using those questions
59:47
to ask when you are interviewing for new roles or like I said,
59:50
I'm kind of inspired to use them in different scenarios. I thought it
59:54
was a great read. Excellent. All right, Tim, what about you?
59:58
So I feel like I'm gonna be pulling some of these is we've
1:00:01
turned in the initial full draft manuscript for the book, which means I've
1:00:05
learned a few things that I'd either forgotten or were new things coming
1:00:10
out of the brain of Joe Sutherland. And
1:00:13
one of them is, it's an oldie but a goodie. It's kind of
1:00:17
an academic paper published on the National Library of Medicine at the NIH
1:00:23
and the paper is titled, Parachute Use to Prevent Death and Major Trauma
1:00:28
Related to Gravitational Challenge, Systematic Review of Randomized Controlled
1:00:32
Trials. So it's from 2003 and it is a brief academic paper where
1:00:39
these two people who basically kind of dared each other, the notes at
1:00:42
the end kind of explain, hint at what happened. But basically they were
1:00:46
looking, saying if scientific evidence really requires a randomized controlled
1:00:51
trial for high stakes things, then surely we should just go into a
1:00:55
survey of all the randomized controlled trials around the efficacy of parachutes.
1:01:00
And the result... They had a whole plan on how they were gonna
1:01:03
find the outcomes and their meta analysis and what they were gonna do.
1:01:06
And the results are that our search strategy did not find any randomized
1:01:09
controlled trials of the parachute. So it's kind of a little bit of
1:01:13
poking fun at the scientific community, but in a kind of a delightful
1:01:18
way with some pretty funny footnotes. And it actually did get kind of
1:01:25
published in a way. So it's just kind of a good reminder of
1:01:29
being clear on the question you're trying to answer and what
1:01:32
your options are for answering it. So that's random.
1:01:37
What about you, Michael? What's your last call? Well, it's interesting.
1:01:41
I had a conversation recently with my niece who's getting ready to start
1:01:44
the school year and she's taking an AP statistics class, which I didn't
1:01:49
even know that kind of class existed in high school.
1:01:51
But we started talking about some of the pre work that she got
1:01:54
assigned and I realized I was like starting to explain some foundational
1:01:59
statistics concepts, that she was kind of like struggling with. And it reminded
1:02:03
me of this book I read early in my career called The Cartoon
1:02:06
Guide to Statistics. 'Cause whenever I go back to sort of those first
1:02:10
things, I'm always reminded of that book, which I got recommended to me
1:02:14
actually by Avinash Kaushik way back in the day. So that's my last
1:02:18
call. I think I may have done it before, but it's been many,
1:02:21
many years. And that conversation sort of brought it back up.
1:02:24
So if you're getting into statistics or you just wanna have a better
1:02:28
foundation in statistics, that's actually a great book to have on your shelf
1:02:32
to pull off and read. And some of the stuff we talked about
1:02:35
today, I kept up with because I've read that book and it's a
1:02:40
cartoon so it's easy. So anyways, Cartoon Guide To Statistics. That's funny.
1:02:44
There you go. It's on my shelf and I never could make it
1:02:46
through it. I should. I should go back and read it now.
1:02:49
I feel like I was... Didn't... Yeah. I should try it again.
1:02:52
It probably would make more sense. Yeah. 'Cause you... What was funny was
1:02:57
how much I realized I'd actually learned over the years about statistics
1:03:01
in just trying to explain a couple things.
1:03:04
And I realized like, wow, I actually know a couple of things about
1:03:07
statistics now, which I think that's important I should know. But it's...
1:03:11
And I think, if we're being honest, all due to the Conductrics quiz.
1:03:15
Oh yeah. Absolutely. Absolutely. Full circle. It's a full circle moment.
1:03:21
A 100%. Well, yeah, this has been obviously such a great conversation and
1:03:25
I know as you're listening, you may have questions, you may have input,
1:03:29
there's things you might wanna share that we would love to hear from
1:03:32
you. And the best way to do that is through the Measure Slack
1:03:35
Chat community, or as much as... We're on LinkedIn as well.
1:03:40
And also you could email us at contact@analyticshour.io and I think,
1:03:46
Matt, you're pretty active on that community as well as on the TLC.
1:03:50
Yeah. Highly recommend folks sign up for the Test and Learn community Run
1:03:55
by Kelly Worthham. That's a great space to learn about all things experimentation
1:04:01
in an inclusive space. Yeah, absolutely. And we heartily recommend it as
1:04:07
well. And it's a great place to explore these ideas and keep this
1:04:11
conversation going as well. So love to hear from you and
1:04:17
keep learning more about privacy engineering, privacy by design, K anonymization,
1:04:22
differential privacy, I mean all new and amazing concepts for me today.
1:04:27
So awesome. All right. And of course, no show would be complete without
1:04:32
a huge thank you to Josh Crowhurst, our producer for all you do
1:04:36
behind the scenes to make this show happen. We thank you very much,
1:04:39
sir. And of course, thank you Matt so much for coming back on
1:04:44
the show. It's always a pleasure. Makes me reminisce about all the awesome
1:04:48
times we've had at SUPERWEEK and other places. It's always a delight to
1:04:52
hang out and talk. Thank you so much for having me.
1:04:55
I really appreciate you all welcoming me back and it was great to
1:04:58
meet you, Julie. Yeah, you too. Awesome. And I think I speak for
1:05:04
a random assortment of co hosts that I may have,
1:05:08
that I've incremented a couple of times when I say, no matter how
1:05:12
you're trying to drive forward with privacy, remember,
1:05:15
keep analyzing. Thanks for listening. Let's keep the conversation going
1:05:21
with your comments, suggestions, and questions on Twitter at @AnalyticsHour,
1:05:26
on the web, at analyticshour.io, our LinkedIn group and the Measured Chat
1:05:32
Slack group. Music for the podcast by Josh Crowhurst. So smart guys want
1:05:38
to fit in, so they made up a term called analytics.
1:05:41
Analytics don't work. Do the analytics. Say go for it, no matter who's
1:05:45
going for it. So if you and I were on the field, the
1:05:48
analytics say go for it. It's the stupidest, laziest, lamest thing I've
1:05:53
ever heard for reasoning in competition. Text was like, Tim and Mo were
1:05:59
supposed to be cool, almost like secret agents and like just had their shit
1:06:03
together. And Michael was just kind of like, did you ever see, what's that
1:06:07
movie with Matt Damon and Alec Baldwin? And it's like all Boston and
1:06:13
Wahlberg. And there's that scene where Alec Baldwin is like the police commissioner
1:06:18
and he's all like frantic and he's sweating and he's just like, totally
1:06:21
discombobulated. That was how I thought of Michael, which just like totally
1:06:26
out of sorts, just... And, then Tim and Mo would just kind of come
1:06:31
in and just be like cool cucumbers and like, just have their shit together.
1:06:35
And Michael never played it correctly. And he edited it out.
1:06:39
He wouldn't say... Oh, but anyway. I sent... I had a dialogue for
1:06:47
him. No. That was the whole bit. Oh, man. But how did you really feel? But
1:06:56
Michael, I can't believe, like I thought he would just like lean into
1:06:59
it, but no, he was too embarrassed or he like didn't like,
1:07:02
he's like, his ego was too great to play. He just didn't commit. Yeah. He
1:07:06
just didn't wanna play it. I think, he just couldn't play it up.
1:07:08
He's like, I'm too serious for this. I'm not gonna be the one
1:07:11
who doesn't know what's going on. Well, you're not the one who's answering
1:07:13
the questions. That was the whole point. I didn't understand the vision.
1:07:17
But I just didn't understand the vision. I'm not cut out for high level
1:07:24
acting. Julie picked up on it. Julie picked up on it.
1:07:27
That was... No, Michael said that verbatim in one of the episodes.
1:07:31
He literally stopped midway into the quiz and he goes, why am I
1:07:34
always panicking? Why am I so frantic in this? That's the whole bit. That
1:07:37
was like the narrative theme. Mo and Tim were just like the 007s. Rock
Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More