Episode Transcript
Transcripts are displayed as originally observed. Some content, including advertisements may have changed.
Use Ctrl + F to search
0:00
Okay, I'm always
0:02
rushing. You'd think I
0:04
don't have time to
0:06
organize all this in advance. Why
0:09
is my head so big up there? The
0:12
top of ABS is like super
0:14
zoomed. I don't really need that, do
0:16
I? No, it looks okay over there.
0:19
It's fine. Good morning.
0:21
I am rushing a little
0:23
bit. Where are we now? 5
0:25
30 a .m. So
0:27
it's And I've been up for an
0:29
hour and a half. We
0:31
are, we're traveling today is
0:33
a travel day. Hence the super
0:35
early start. I've
0:37
got to get rid of my head from
0:39
there. It looks massive. It's
0:42
a super early start. Now
0:45
we're exotic today off to Sydney.
0:48
So Sydney,
0:50
for those of you from other parts of the world,
0:52
you'll know about Sydney. Australia is a lot bigger
0:54
than just Sydney as it turns out. We
0:58
live what are you about an hour
1:00
20 flight or something so it's hope
1:02
it's it's handy that had various things
1:04
come up in Sydney all at
1:06
once so today we're going to
1:08
go and spend a bit
1:10
of time with one of our
1:13
consumers of have it been pwned
1:15
things next week we've got some
1:17
law enforcement meetings with the folks from
1:19
a couple of different parts of the
1:21
world which is always really interesting I
1:23
like these meetings it's cool because you never know
1:25
exactly what's going to come up Anyway,
1:27
we'll see if there's anything out of that I can talk
1:29
about later on. Wayne
1:32
James Null Division, the game is over.
1:34
Yeah, good day. Welcome. Welcome
1:37
to 5 .30
1:39
AM on the Gold Coast. One of the
1:41
weird things about Australia, many
1:43
weird things, is
1:46
keeping in mind Australia is about
1:48
the size of Europe or continental
1:50
US. We've got lots of time
1:52
zones, which is fine. But in
1:54
most parts of the world, when
1:56
you go up and down the country,
1:58
the time zones stay the same. When
2:00
you go left and right, I forget which
2:02
one's letting along the change. For
2:06
reasons that are still not
2:08
entirely clear, we
2:11
don't have daylight saving in
2:13
our state in Queensland, but 30
2:15
kilometers due south. New
2:17
South Wales, the state that Sydney is in, they
2:20
have daylight saving. So
2:22
you get a roughly half the
2:24
year where everyone there is
2:26
an hour later than everyone here.
2:29
And the border actually goes
2:31
down the middle of a street
2:33
in a town, a fairly sizable
2:35
town. So you've got one side
2:37
of the road that right now
2:39
is 5 .33 in the morning for me. And
2:41
the other side of the road is 6 .33. and
2:44
it also means when we go to Sydney
2:46
on a day like today so if you look
2:48
at my trip and it's like okay we need
2:50
to leave soon but then we land after
2:52
lunch but oh yeah because it's
2:55
an hour and fortunately I don't
2:57
have to go to an office or do
2:59
stuff like that anymore but if I had to
3:01
go to Sydney for a day in the
3:03
past it's very hard to leave here in
3:05
the morning and be there for 9
3:07
a .m for half the year
3:09
because of the daylight saving thing Anyway,
3:13
fun useless Australian trivia. Apparently it's something
3:15
to do with either confusing the cows
3:17
or fading the curtains or some bullshit
3:19
reason like this. So we
3:21
don't have daylight saving. Johnny
3:24
Docs in the UK, makes it off
3:26
to Sydney in a few weeks, looking
3:28
forward to it. Sydney's a nice city.
3:30
I don't mind going to Sydney. Melbourne
3:33
not as much, Sydney's
3:36
right. And
3:39
Jane says, AU has 30 minute time zones, which
3:41
seems strange to some of us. Yeah, so that's
3:43
a bit of a mess as well. So
3:46
I think South Australia and the Northern
3:48
Territory are 30 minutes behind us. So you
3:50
know how when you're trying to communicate
3:52
with people somewhere else on a different time
3:55
on other side of the world and
3:57
you're like, are we going to meet on
3:59
the hour? It's like, well, it's
4:01
not the same way everywhere. It doesn't
4:03
work the same way everywhere, which makes it
4:05
a little bit messy. Yeah,
4:09
like James says, boss, you're an
4:12
L .A. employee and I'm working from
4:14
home across the street. But
4:16
this is what it's like for families down
4:19
there. And a lot of the argument that people
4:21
have made for years and years and years
4:23
about why our state here, Queensland, should have daylight
4:25
saving, is that particularly when you're
4:27
on, I was going
4:29
to say border towns. There's really only one
4:31
border town because if ever seen the population
4:33
distribution in Australia. We're pretty
4:35
much all on the coast and pretty much
4:37
all on the eastern side with a few
4:39
exceptions So this one town down Tweed heads
4:41
like on the coast There's a bunch of
4:43
families that do work on one side and
4:45
live on the other side or the kids
4:47
go to school on one side there's probably
4:49
kids Where one kid goes to school on
4:51
one side of the street and the other
4:53
kid goes to school on the other side
4:55
of the street and you've got to coordinate
4:57
around that so that's I Don't know why
5:00
we haven't changed that. It's just it is
5:02
a weird Anachronism? Is
5:04
that the word? Anyway, it's a weird thing.
5:08
Steer the logs. So I was
5:10
kind of thinking, what do I
5:12
talk about early today? And
5:15
I do have a big blog post coming on this.
5:17
And so one of the things I'm going to be doing
5:19
on this trip is finishing this blog post. But
5:22
if you recall, last
5:24
month, the thing that was bang on the middle of last
5:26
month, so probably about five weeks ago, we
5:28
published in Steer the logs in the Have
5:30
I Been Pawn, not unusual. There
5:32
were how many there like 220
5:34
million or something like that. I've
5:36
got to get my facts right
5:38
here, but for the first time
5:40
ever not only did we publish
5:42
the email addresses, but we published
5:45
a mapping of the email addresses
5:47
to the website domains. I'm using
5:49
my words really carefully here and
5:51
we've spent the better part of
5:53
the last month working on a
5:55
lot of this terminology. We'll
5:57
look for million. I think
6:00
it was uh 220 million unique row and
6:02
just keep these numbers in mind because you
6:04
realize how big this last set is. Last
6:06
month 220 million unique rows
6:08
of email address domain pair covering
6:10
69 million of the total
6:12
71 million email address in the
6:15
data. So what tends to
6:17
happen with that set and the
6:19
setting I'm talking about in
6:21
a moment is that you've got
6:23
a a truckload there's going
6:25
to say something else you got
6:27
a truckload of text files. And
6:30
in the case of what we're going to talk
6:32
about today, let me get my numbers right. I
6:34
think it's 744 because we've been looking at this
6:36
nonstop for last month. We
6:39
have 744 text
6:41
files. And as I've spoken
6:43
with Stealer Logs before, the Stealer Logs
6:45
are the URL that was in the address
6:47
bar when the infected person entered, so
6:49
that's part one, their email address in the
6:51
password. So there's three parts. Now
6:54
we use our email extractor tool
6:56
just to go in there and
6:58
grab email addresses out of
7:01
text files, totally indiscriminate. So
7:03
for a classic data, classic, you know what
7:05
I mean, normal data rich, it
7:07
might be somewhere within that normal data
7:09
breach, you've got a couple of
7:11
million rows of user records and every
7:13
row has the username, the IP
7:15
address and the email address and it
7:17
just picks it up. beautifully. If
7:19
somewhere else further down, there's also DMs
7:21
and in the DM, there's another
7:23
email address that appears, then that gets
7:25
all bundled in because it is
7:28
in the data breach as well. Context
7:30
is different, but still in the
7:32
data breach. So we run the
7:34
email addresses through there and we pick
7:36
these up. So we had 71 million
7:38
last time. Now the
7:40
stealer logs, when we went through
7:42
and we picked up the individual
7:44
instances of email address. against the
7:46
domain of the website that they're
7:48
logging into. We don't do the
7:51
full URL. So if you're like
7:53
Spotify .com, you
7:55
just get
7:57
Spotify .com. When
7:59
we grabbed all
8:02
of those 220 million
8:04
unique rows of
8:06
domain and email address,
8:09
there were 69 million email addresses in
8:11
there. Now the gap between the 69
8:13
million and the 71 million was that
8:15
there were some email addresses in the
8:17
stealer logs that didn't sit cleanly against
8:20
a URL email password triplet. So
8:22
220 million rows, 71
8:25
million email addresses, 69 million
8:27
of which were against a URL
8:29
that we could extract. Now,
8:33
data that I've just been
8:35
dealing with, and then we'll talk about
8:37
the challenges of dealing with this much data. One
8:41
a half terabytes. worth
8:43
the text files. 744
8:45
files. There
8:48
are 789 million
8:50
unique email address
8:52
and domain pairs.
8:55
So it's 220 in the
8:58
last last lot. 789
9:00
million. 284 million unique email
9:02
addresses. 244
9:05
million passwords that we've never
9:07
seen before have gone into pound
9:10
passwords. and
9:12
199 million passwords that we
9:14
had seen before have had their
9:16
counts updated. Now,
9:19
this is not all live yet. This
9:21
is going live next week. We're just on
9:23
the final stage of the processing verification.
9:25
And I thought, given I
9:27
wasn't sure what else to talk about
9:29
today, and I did kind of rush
9:31
it this morning, we just go through
9:33
what was involved in processing this because
9:35
it is nuts. I
9:39
think in the blog post, I'll give a little
9:42
bit more background about how I came across this
9:44
and it was someone from a government somewhere was
9:46
like, look, have a look at this. And
9:49
they sent me these two files.
9:51
And I was like, there's a substantial
9:53
amount of new stuff in here. It's
9:55
obviously legitimate. And I
9:57
started going through the load process to have
9:59
it been pwned. And then I just, as
10:01
I was doing the verification, I was looking
10:04
into it more and pulling the threads. And
10:06
I found the stash of the larger corpus. And
10:08
then I was like, OK, well, now I've got to go and grab all the
10:10
data. This will be fine.
10:13
So how do you deal
10:15
with that much data? There's
10:22
a point in this blog post where I'm like, I'm trying
10:24
to solve it in the cloud. I was like, the cloud was
10:26
not the solution. Let me just maybe
10:28
run through the mechanics of it. A
10:31
lot of this is just custom written code to try
10:34
and do things as efficiently as we can with really
10:36
large amounts of data. So one of the things we
10:38
have to do with a corpus like this is extract
10:40
all the email addresses. Now there
10:42
is an open source email extractor
10:44
tool to do this. For one
10:46
a half terabytes worth of data
10:48
as 744 files, it
10:50
took, I think it took somewhere in the order of a
10:52
day. Actually, I'll rephrase it. We got
10:54
to about three quarters of a day before it
10:56
crashed. Now, the
10:59
reason it crashed. And
11:01
one of the, just the number of times
11:03
I had to say to shout, look, we
11:05
just paid for a bunch of cloud or
11:07
a bunch of time has been burned, only
11:09
to discover that this wasn't gonna work. The
11:11
reason it crashed is that there is one
11:13
line in one of the files that caused
11:15
an out of memory exception for the .NET
11:17
app that was reading it. Because
11:19
there is one line that
11:21
is so long, it crashed the
11:23
code. And Unless
11:26
you're actually wrapping that in a try -catch
11:28
to catch that exception where you run out
11:30
of memory, everything
11:33
just bombs out. Now
11:36
in the email extract of the
11:38
open source one, that wasn't obvious what
11:40
was happening. It gobbles up that
11:42
exception. In my own custom code
11:44
to extract the stealer log pairs, which
11:46
I'll come back to in a moment, I
11:48
was actually seeing the full exception message. Unfortunately,
11:50
I ran into that exception before I ran into
11:52
the other exception. So I end up getting
11:55
to the point where We just had to stop
11:57
reading the file. That's why only one of
11:59
744, I think you can get three quarters of
12:01
the way through the file, had to stop
12:03
it on that exception. But that's the sort of
12:05
thing that you get a long way down
12:07
the road, hours and hours and hours and hours.
12:10
And then it just crashes and effectively all that
12:12
work gets unwound because it buffers up all of
12:14
the email addresses it finds in memory. And when
12:16
it gets to the end of it, then it
12:18
writes it all out to the file to be
12:20
efficient, which was a problem. So
12:25
if it says lots of storage space and RAM is how
12:27
you deal with it. So, Steph, and
12:29
I put on that, and in
12:31
fact, anyone can see this, github .com,
12:33
which has never been pwned. There's a
12:35
repo, I think it's called email
12:37
extractor. I have raised an issue in
12:39
there. Steph, and if you wanna
12:41
see the actual file, I think I remember which one it
12:43
is, and I'll send it to you
12:45
and you can see it, but yeah, it crashed
12:47
out. Right, so we
12:49
can get the email addresses out eventually. But
12:53
the other thing that we need
12:55
to get out is the passwords.
12:57
Now, the passwords,
13:00
when a steal a log has a
13:02
predictable pattern, the three parts we've discussed
13:04
before, the URL, the email address, the
13:06
password. You can read a
13:08
line. You can split based on a
13:10
colon. So long as there are three parts,
13:12
then it's considered valid. And then you take
13:14
the last part and that's the password. But
13:16
then just to be safe, we've got a
13:18
little bit of validation in there which says
13:20
if this is an email address, discard it.
13:22
Because sometimes stuff gets messed up. And
13:24
incidentally, is one of the reasons we store
13:26
passwords in Heverbemphones hashes. So that if the
13:28
parsing does screw up and you end up with
13:30
the whole line with the URL and the
13:32
email address and the password. Or even if
13:35
you just end up with the email address. You're
13:38
not then redistributing PII, so if
13:40
people go and download all of these
13:42
passwords and there's a bad parsing
13:44
issue in there and you've got someone's
13:47
PII, that's a bad thing. If
13:49
it's an entirely hashed line, then
13:51
it's got a level of protection that's
13:53
probably not going to be broken unless
13:55
you're trying to crack hashes and somehow
13:57
you manage to guess the entire unpaused
13:59
line, which is extraordinarily unlikely. WAN
14:03
says maybe a local box of VM, run it
14:05
through locally to ensure it works, then run in
14:07
Azure. Well, I'll come back to the stuff we've tried
14:09
to run in Azure and what didn't, didn't work. So
14:12
the password's not too much for drama. But
14:15
the other thing is that we
14:17
need to then separately to that get
14:19
out because we're now loading domain
14:21
and email pairs. We need
14:23
to get those two out together.
14:26
So another little console app. I've written goes
14:28
through, same thing, looks for the three
14:30
parts, and then it looks for the first
14:32
part. Is it a domain that passes
14:34
or rather is it a URL with a
14:36
domain that passes some basic validation? And
14:39
that validation includes things like there's
14:41
a regex for domain, there's a TLD
14:43
validator, is it on a valid
14:45
TLD? And then the same sort
14:47
of thing for the email address, it
14:49
validates the domain part with the same logic
14:52
and the alias has got another regex
14:54
and effectively max length to it. And
14:56
that's it, now. That
14:59
sounds easy. So you run this
15:01
and for every file it goes through
15:03
and it creates two files. It
15:05
creates one file for the valid entries
15:08
and one file for the invalid
15:10
entries. And I did this because I
15:12
wanted to see what was being
15:14
rejected to see if I was false
15:16
-positiving any stuff. So now
15:18
I've ended up with one file. This is all
15:20
the stuff gonna discard. One file of the
15:22
good stuff, 744 times. So
15:24
this is where we
15:26
ended up with 744 files
15:29
that had 789 million, and
15:32
I'll rephrase it because it's actually
15:34
more than that. We ended up for
15:36
each file with a distinct email
15:38
address and domain for each file. But
15:40
turns out there's a lot of
15:42
redundancy across the files. So at some
15:44
point in time, you've got to
15:46
de -dupe that redundancy. It's too much
15:48
data to just do the whole thing
15:50
in memory or so I thought.
15:52
That's where I started. So
15:54
I've got all this data sitting there in text file.
15:56
We've got to get it into SQL Server in Azure.
15:59
So it ended up being, I think I was
16:01
running it at night, compressing this stuff.
16:04
You can compress it down pretty well, but
16:06
then you're trading off. It's like, okay, I've
16:08
got to get it up to the cloud.
16:10
I'm trading off the time it takes to
16:12
compress with the gain that's made by uploading
16:14
a compressed file versus how fast is my
16:16
internet connection if I just drop it all
16:18
in a synced folder and I can access
16:20
it on a VM in Azure. which
16:22
is what I did. And
16:24
then you got to get
16:26
744 files with a lot
16:28
of hundreds of millions of
16:31
rows into SQL Azure. So
16:33
SQL BCP is very fast
16:35
at this. So you can run
16:37
SQL BCP and it picks up a file
16:39
with the delimiter and it drops it in a
16:41
target table. It's dumb, it doesn't do any
16:43
transformations along the way, but so long as your
16:45
data is structured and consistent and it goes
16:47
into columns that match, you're good. So
16:50
we do that and that takes about
16:53
a day, even when you're there
16:55
on the local network from a virtual
16:57
machine into a SQL Server database
16:59
on the same network. So
17:01
I did all that and then
17:03
okay, well that's cool. So now
17:05
we've got the website domain and
17:07
you've got the email address. But
17:10
to get it into the data
17:12
structure that we're moving towards, we
17:14
need to split that email address
17:16
into alias and domain pairs. And
17:18
we also need to get
17:20
a unique set of website
17:23
domain and email address. Now
17:25
SQL is really good at doing unique. So
17:27
you got a distinct clause. So we can say,
17:29
well, distinct operator, you know I mean? So
17:32
we can just say select star
17:34
from Stila log. And instead of that,
17:36
it's like select distinct website domain
17:38
and email address from Stila log. Job
17:40
done, no problems. And
17:43
it's a big data set. So I scale
17:45
Azure all the up to 80 cores on
17:47
hyperscale. Now 80 cores on hyperscale is costing
17:49
us, I think, 800 something Aussie dollars a
17:51
day. So let's call it, let's call it
17:53
nearly 600 US dollars a day. Which
17:56
is not fun, but it's short term.
17:59
And we do charge for keys and some other services
18:01
so we can kind of justify it. So you
18:03
turn the knob all the way up and then you
18:05
start running the query. And
18:07
then you wait. A
18:12
day goes by. And
18:14
you don't have any idea how long a query
18:16
is going to take as well. And I've got my
18:18
big chart of graphs and everything up on the
18:20
wall, and I'm just looking at it, it's back down
18:22
to four now, but it was sitting at like
18:24
80 cores. But the
18:26
CPU percentage wasn't high, and I'm trying to find
18:29
some sort of metric to show that there's progress.
18:31
The space used, I can
18:33
see the space used is changing through
18:35
some of the processing because we're selecting out
18:37
of one table, selecting a distinct into
18:39
another table, just trying to like filter down,
18:41
start the funnel wide and filter it
18:43
down into smaller and smaller datasets. But
18:46
long story short, we ended up
18:48
getting through it. I think we
18:51
got through it about two days
18:53
and we've now burned one a
18:55
half Aussie thousand dollars worth of
18:57
money on SQL Server cores. It
19:00
hasn't finished. There's no indication
19:02
when it will finish. So
19:05
I ended up realizing that that turned
19:07
out to be much more efficient to go
19:09
all the way back to .NET code
19:11
and just do it. locally in .NET rather
19:14
than trying to do it up on
19:16
the cloud. Now, not only locally
19:18
in .NET, but doing it on this
19:20
machine, I found a spare 10 terabyte
19:22
disk laying around, chucked that in the
19:24
machine, managed to get all the uniques
19:26
out. Then the same job again,
19:29
into a synced folder, up in the virtual
19:31
machine, SQL BCP into the
19:33
database, managed to get all the unique
19:35
domains out of that as well because we've
19:37
got domain as a foreign key into a
19:39
table in Azure. So we need to get
19:42
a list of all of the domains that
19:44
are in these cell logs. And
19:46
then all of the ones we haven't seen before, and
19:48
then there are something like six million domains that
19:50
we hadn't seen before. We've got about, I think, 300
19:52
million in the table. Incidentally, domain
19:54
includes a subdomain. So you get a
19:56
lot of very unique subdomains in
19:58
these exceptions. All those go into the
20:00
table. And where
20:02
we're at now is we've got What
20:06
did I say it
20:08
was? We've got 789 million
20:10
rows in one table that
20:12
have the website domain, the
20:14
email alias and the email
20:16
domain. And
20:18
what we have to do over this
20:20
weekend is scale it all the way
20:23
back up to 80 cores, load the
20:25
284 million unique email addresses like a
20:27
normal data breach. That'll take, I don't
20:29
know how long, maybe half a day
20:31
a day. Load
20:33
that like a normal data breach. And
20:35
then we need to insert the 789 million
20:37
records into the stealer log mapping table that
20:39
maps a domain to a breached email address.
20:41
Breached email address has now been populated because
20:43
we've loaded that breach, not made it
20:45
live, just loaded the breach. So
20:48
we need to get a distinct
20:50
out of the 789 million where
20:52
the domain and email address have
20:54
not been seen in the previous
20:56
set of stealer logs. And I
20:59
worry that that's going to take
21:01
multiple days. And
21:04
if I screw it up, I can burn
21:06
multiple days with nothing having happened and then have
21:08
to roll it back. That
21:12
sounds long and laborious and
21:14
it misses about 90 % of
21:16
what I had to do. At
21:18
one point in time after
21:20
realizing that trying to query the
21:22
entire corpus of email address
21:24
and domain pairs was a futile
21:26
effort and I had to
21:28
cancel. I thought, you know what
21:30
I'm doing? I'll try and
21:32
batch it. So what if we try and batch it? And
21:35
we just process like a million at a time
21:37
and we just iterate through. And
21:39
I spent a couple of hours writing the
21:41
code to do this, testing it locally. And
21:44
while I was doing that, I thought,
21:46
well, what I'll do is I'll just
21:48
drop an ID column onto the data
21:51
that had already been loaded. ID,
21:53
order incrementing number. That
21:57
took. About
22:00
a day and a half to run. It
22:02
took our storage from, it went from
22:04
like two terabytes up to four terabytes
22:06
something to add a column. I've got
22:09
to screen grab this, I'm gonna put
22:11
it in the blog post, it's ridiculous.
22:14
Just to add this column. I
22:19
gotta find the image now.
22:21
This was not data storage
22:23
size, here we go. not
22:27
for the last 24 hours, but
22:29
we're gonna do this for the last
22:31
week, because that's how long are
22:33
we mucking around this stuff for? Is
22:37
that the right thing? Am I on
22:39
the right database? Data storage, so maybe it
22:41
was capacity or used or something like
22:43
that. Data space used, that's where after. Yeah,
22:46
so I got this really wild. Chart
22:49
now where it's like sit would normally we're
22:51
using about one terabyte worth of data and then
22:53
I sequel BCP up a bunch of it
22:55
and it goes like this But I had to
22:57
do it in two parts so it flat
22:59
lines a bit. We'll wait for the next part
23:01
and then it goes like this and Then
23:03
it flat lines for ages while I'm realizing the
23:05
fact I just can't run this distinct query
23:07
and I drop in this auto incrementing column and
23:09
then We've gone from like where were we? We're
23:12
like 994 gigabytes worth of space
23:14
used to a sequel BCP everything up
23:16
now at two terabytes We add
23:18
a column we go from two terabytes
23:20
up to 3 .7 terabytes and then
23:22
it sits flat for a while
23:24
and then Then this is just like
23:26
an altar table add column kind
23:28
of thing And then it goes all
23:31
out to 4 .6 terabytes and then
23:33
suddenly it drops from 4 .6 terabytes
23:35
down to 1 .9 terabytes Which is
23:37
roughly where it's sitting now And
23:41
all the while, I'm just like
23:43
feeding money into the Azure machine.
23:45
Anyway, the sort of lesson
23:48
I learned on that was to do
23:50
as much local processing as possible in
23:52
.NET. And the relational database stuff is
23:54
great for queries and joins and things
23:56
like this. But for actually filtering down
23:58
the data, it was just a bad
24:00
idea. And so I think somewhere in
24:02
this blog post, I've made the comment
24:05
that the cloud is not the solution
24:07
to all your problems. And
24:09
so many times, even when I've been talking about getting,
24:11
specing a computer, you know,
24:14
I try and spec computers
24:16
pretty high. And people
24:18
go, why are you doing that? Why don't you just do it all in the
24:20
cloud? There's loads and
24:22
loads of stuff, which is fantastic about
24:24
the cloud and loads of stuff that's a
24:26
nightmare. I'm going to screen grab this
24:28
graph now. I'm going
24:30
to put that in the blog post. Because
24:34
I can see it's like a roller coaster
24:36
of emotions. as all
24:38
the capacity is used. And
24:41
I can just see, you get charged based on
24:43
the number of cores you're running and also based on
24:45
the amount of storage. How
24:47
much do this end up costing me? I
24:50
mean, normally our database
24:52
runs pretty efficiently. Drop
24:54
that in the blog
24:57
post there. But
25:00
I have been a bit scared to look. Now
25:02
actually, that reminds me. So I'll talk about what
25:04
we want to do with some pricing stuff as well.
25:07
And show it so I
25:09
should talk about it
25:11
here and get people's opinions
25:13
and see if this
25:15
this sounds Reasonable and sensible
25:17
as my dashboard So
25:19
while we're running those big
25:21
instances Okay, there's a
25:24
bit more than I thought
25:26
that was Ozzy
25:28
dollars Ozzy dollars so take off
25:30
about a third for American dollars
25:32
one day was a thousand and
25:34
twenty six dollars another day of
25:36
the thousand forty nine dollars another
25:38
day was nine hundred forty four
25:41
dollars but normally we would be
25:43
running at one hundred something dollars
25:45
for a day of sequel as
25:47
you are to support the 14
25:49
odd billion records and the gazillion
25:51
requests we get every day so
25:53
normally it's pretty good Last
25:55
few days, not so much. Let me look at the comments
25:57
and I'll talk about how we want to try and structure this.
26:02
Daylight saving stuff. I
26:07
think we just want to make everything as
26:09
fast as we can. Look at like we want
26:11
fast. Internet is actually pretty good here in
26:13
Australia now. We do get reliable gigabit down. I
26:15
get about 400 megabits up, which for the
26:17
most what we do is really good. James
26:21
says where I live. Often
26:23
you upload as wayside down. Yeah, exactly. Yep,
26:26
you don't have that quite that
26:28
parody. James
26:31
is doing a look at the email extractor
26:33
part and curious what you determined is a valid
26:35
email. So one of the
26:37
things mental notes, Stefan, one of
26:39
the things we have to do
26:42
is we don't yet have consistent
26:44
logic across. every point
26:46
in the app about what makes a valid email
26:48
address. But have a look at the email
26:50
extractor app and have a look at the issues. There's a lot
26:52
of discussion there. It
26:54
is impossible to write a regex
26:56
which meets the RFC for an
26:58
email address and it doesn't matter.
27:00
And the reason it doesn't matter
27:03
is that how many people struggle
27:05
with even slight but valid variations
27:07
of email addresses. So I know
27:09
a lot of people struggle. with
27:12
various websites that won't accept say a plus in
27:14
the alias, which is perfectly valid and useful and
27:16
a lot of people use it for sub addressing.
27:19
So when you get to weird things like you
27:21
can have another at symbol in the alias
27:23
so long as it's escaped or something to that
27:25
effect, no one's legitimately using
27:27
that. Like your life would be
27:29
miserable trying to enter that email
27:31
address into places. So
27:34
we have to strike this
27:36
balance where things like non -alpha
27:38
numeric characters and other
27:40
than I think a dash and maybe an
27:42
underscore and a dot and a plus
27:44
are not valid in the alias. And we've
27:46
just then got to do that consistently
27:48
everywhere. And it doesn't matter
27:50
if we have a false positive because we've
27:53
then got this, let's say in this case,
27:55
hundreds of millions of email addresses. If one
27:57
gets excluded because it's this valid, weird syntax
27:59
that one strange person uses somewhere that managed
28:01
to accept it in a website, it doesn't
28:03
matter if that's not included. So
28:05
we're trying to weed out
28:07
the junk. and keep the legitimate stuff
28:10
as much as possible. But have a look at that
28:12
repo. There's lots of discussion there. And
28:14
in fact, that repository does need to have
28:16
an update. Because when I look at
28:18
the email addresses that get extracted out of
28:20
data, there's still a bit of junk
28:22
in there. And when I say a bit
28:24
of junk, it's like a tiny fraction
28:26
of 1%. And nothing too bad happens if
28:28
the junk goes in either. But
28:31
it'd be nice to have as little
28:33
as possible. Yeah,
28:37
so Wayne it's your suggestion.
28:39
Maybe local box or VM run
28:41
it through locally to ensure
28:43
it works right as your Yeah,
28:45
even these queries that can
28:47
take days to run Do you
28:49
want to run it locally
28:51
for days? And
28:53
then have to run it
28:56
remotely. It's I think the
28:58
overarching theme is filter and
29:00
minimize as much as possible
29:02
locally and get the cleanest
29:04
set possible up into SQL
29:06
Azure. And what the
29:09
cleanest set possible means is that
29:11
when we upload the email addresses,
29:13
it's like any data breach, just
29:15
email addresses. When we're dealing with
29:17
the domains from Stealer Logs, we
29:19
just have a distinct set of
29:21
domains, then it has to go
29:23
up into SQL Azure as the
29:25
full distinct set. Because then
29:27
we need to see which ones are already
29:29
having which ones we don't. And of course,
29:31
when we upload the steel logs themselves, the
29:33
mapping between the websites and the email addresses,
29:35
we have to get that entire mapping up
29:37
there. But if we can just get a
29:39
distinct list, then that's the smallest possible footprint
29:42
we can have. And then they
29:44
aren't there. Yeah, see, the only thing we
29:46
can do a little bit differently is try
29:48
and save ourselves some joins later on. Because
29:50
when you end up with 789 million rows
29:52
in a table, and you've
29:54
got website domain, email alias, email
29:56
domain, you're basically doing
29:58
three joins from 789
30:00
million records onto onto the
30:02
other tables. Now what
30:05
we could do, and this
30:07
is where you blow
30:09
heaps of time and mental
30:11
effort, we could
30:13
add ID columns, boar,
30:18
the two domains, and then we
30:20
could literally just go through and
30:22
update that one table. with one
30:24
join onto the domain table for
30:26
one column and then another join
30:28
for the other column. You could
30:31
run as two separate queries. And
30:33
that would help save the join when
30:35
you do the big insert later on or
30:37
it helps save two joins. But
30:40
by doing two smaller queries
30:42
now, will that be more efficient
30:44
than it does your head
30:46
in? And
30:48
every experiment costs
30:50
$1 ,000. and
30:53
I can't figure out a way to do that more
30:56
efficiently. Anyway, what
30:59
else we got here in the comments? Yeah,
31:03
so Wayne's saying, yeah, I do it easier. James
31:07
says, I found the part of what I was looking for,
31:09
man, I hate regex. Yeah,
31:12
yeah, yeah, yeah, I know. I
31:14
actually think we can simplify that
31:16
a lot. if we're a bit
31:18
more brutal about excluding things that
31:20
are spec compliant, but people just
31:22
don't use. Wayne
31:26
says, so cost around 4k more
31:28
than usual. Well, yeah, and that's just,
31:30
yeah, yeah, yeah, pretty much mate. So
31:33
we will come back to the pricing thing in a
31:35
moment because we've got to make this back somehow. Wayne
31:41
says, make it a library and share it across
31:43
the solution in terms of email validation. Now you
31:45
can do that with all the dot net bits,
31:47
but Also, on the SQL
31:49
database side of things, we've got a
31:51
couple of functions, which validate the same
31:53
thing, because there are times when we
31:55
wanna run that across data that's in
31:57
SQL Server that hasn't already been run
31:59
through that same logic outside of that.
32:03
But yes, you are right, at the very least,
32:05
we could have a like, maybe we should even
32:07
just, I know Stefan had to
32:09
go. If you
32:11
listen to this later, Stefan, maybe we could
32:13
just make this a standalone. open
32:16
source library and it can go up into
32:18
NuGet or something like that. That might be
32:20
a good idea too. If
32:22
anyone feels like doing that, James says
32:24
plus and dot are common. I added a
32:26
dot to my sister's email address so
32:28
she could create a second account on the
32:30
site. So for those of you not
32:32
getting what James is saying here, often the
32:34
dot part. of the alias is ignored, so
32:37
if you're john at gmail .com you
32:39
could also be jo .hn at gmail
32:41
.com and that is viewed as a
32:43
unique email address by websites, including
32:45
Have I Been Pwn, but it's actually
32:48
the same thing. I'm not sure
32:50
that it is part of the RSC,
32:52
that the dot part of the
32:54
alias gets ignored, or if it's just
32:56
a gmail implementation. Wayne
33:00
says automate it it passes locally since
33:02
we do it. Yeah, and again, as
33:05
a rough theme, that's true.
33:07
The thing is, like the
33:09
computer sitting here on my
33:11
desk is effectively sunk cost. The
33:13
money's already been paid. I
33:15
expect it pretty high. It's five years old now, but
33:17
it's still a good machine. I was talking recently about
33:19
getting another one. I think I'm just gonna stick with
33:21
this because it's actually, for the most
33:24
part, doing everything I need. Using
33:27
this to the fullest extent possible and getting
33:29
the smallest amount of data possible up to the
33:31
cloud for processing is the big lesson out
33:33
of this. And I'll make a note. gonna
33:36
make a note of that in here. Yeah, somewhere
33:38
here. Yeah. I
33:41
did say the cloud, my friends, is
33:43
not always the answer. And
33:46
I'm gonna make
33:48
another comment here. Do
33:50
as much processing
33:52
as possible locally. Yeah,
33:54
and that's. I'm
33:57
going to emphasise that. Neil
34:00
says, morning, I was going to catch you
34:02
live. We're not in UK, currently heading down the
34:04
road to Byron Bay. Now
34:06
make sure you take your Instagram to
34:08
Byron Bay. My understanding of Byron Bay
34:10
is unless you have Instagram, you will
34:12
not get in and you will have
34:14
to post many name photos or everything.
34:16
Actually, we had a Norwegian friend visit
34:18
us yesterday, had been to Byron. I'm
34:20
saying Byron, I don't want to
34:22
get completely off topic here, but Byron
34:24
was really nice up until probably
34:26
2000 where it was sleepy and beautiful
34:28
and it's just it's it's just
34:30
been flooded for reasons I don't understand
34:32
by tourists and celebrities and Instagrammers
34:35
and I vehemently dislike coming in now
34:37
and I know a bunch of
34:39
people that have left there people like
34:41
Patrick Gray who does risky business
34:43
used to live there and moved out
34:45
I think a lot of it
34:47
because of that sort of thing and
34:49
there's lots of nice quiet places
34:51
but you'll have fun anyway Wayne
34:54
says, I think you run C sharp and
34:56
SQL server, not sure that works in Azure.
34:58
Yeah, that's been there for a while, and
35:00
I think there are some limitations with doing
35:02
that in Azure. But if we distill this
35:04
email address and domain validation logic, so there
35:06
are two different bits here, and of course,
35:08
part of email address validation is domain validation
35:10
as well. If we distill that down into
35:12
simple rules, then it should be easy to
35:14
have it in both. Matthew
35:17
says, it ended up finding solution for your new
35:19
light switches and IoT dimmer switches. I've had to
35:21
go through a similar process. No,
35:23
but the good money at the moment
35:26
is on still having traditional light switches and
35:28
Shelly's Our Sparky has collected several different
35:30
options, which he's going to Bring over once
35:32
we get back from this trip and
35:34
then we'll have a few different things on
35:36
the wall or post some photos All
35:38
right, let me move on with this. There's
35:41
The next part of this still log
35:43
bit, you know last month I wrote
35:45
this blog post and said it's experimental
35:47
We're just seeing if it works. Well
35:50
loading the starter up that
35:52
being the data that maps the email
35:54
addresses and the websites. Now,
35:57
it turns out it did work well. A
35:59
lot of people found that really, really useful.
36:01
A lot of people wanted more ways to
36:03
query that. So for example, we
36:05
got a bunch of requests
36:07
from organizations who are like,
36:09
look, we run acmecore .com. And
36:12
we would like to see the
36:14
email addresses that have been entered against
36:16
AcmeCore .com and then appear in the
36:18
logs because they are our customers
36:20
and they are having their accounts taken
36:22
over. How can we get access
36:24
to that data? And
36:27
we have not done that before. The
36:29
bit that we did do last time
36:31
around last month is we made sure
36:33
that if you, I'll
36:36
just pick Pfizer as an example given
36:38
my long tenure at the place. If
36:41
you're InfoSec at Pfizer and
36:43
you control Pfizer .com, then
36:46
you could get a list of
36:48
your Pfizer people, so
36:50
Troy .Hunter at Pfizer .com, as it used
36:52
to be. You could get the
36:54
list of those email addresses, but you could
36:56
also get a list of which sites their
36:58
data had been captured against, because
37:00
effectively all that ties back to your domain. Now,
37:05
a lot of what I've done,
37:07
particularly as I've gotten to
37:09
the tail end of all this
37:11
processing is to try and articulate
37:13
the correct nomenclature about steel logs.
37:15
And I've settled on website domain.
37:18
So we've got a situation where
37:20
we've got the website domain being
37:22
the, if you're someone's logging onto
37:24
Spotify and there's a great big
37:26
string of path and crap after
37:28
it, the website domain is Spotify
37:31
.com. You've then got the alias. let's
37:34
say it's Troi .hunt, we'll pick the
37:36
files on, and the email domain,
37:38
files .com. So we've got these three
37:40
parts. So what we're
37:42
wanting to do here is
37:44
make available the ability for
37:46
someone who controls the website
37:48
domain of the stealer log
37:50
to see the email addresses
37:52
that appear against it. And
37:56
that's what we're gonna do here. And
37:58
the thinking, and this is what Charlotte said, talk on
38:00
the video, so what people think. The thinking
38:02
at the moment is that The
38:05
demographic that this is going
38:07
to be relevant for is largely
38:09
organizations running popular services that
38:11
lots of people authenticate to, i
38:13
.e. big websites. The
38:16
top sites, in fact, you know what, I've got a
38:18
list of the top sites here. I can read it. Now
38:21
this is the top sites. Here
38:26
we go. So this
38:28
is just some of the top sites from the last set
38:30
of steel logs. So it's already in SQL Server. Up
38:33
in Azure, so I could
38:35
query it. So the top sites
38:37
are things like mega .nz, twitter
38:39
.com, netflix .com, zoom .us, discord.
38:43
And it's fascinating to go through the list
38:45
here. Wise, you know, wise, which used
38:47
to be transfer wise is transferring money. These
38:50
all have tens or hundreds of
38:53
thousands or millions as of the
38:55
last set of stellar logs before
38:57
you get to this lot of
38:59
email addresses that appear against them.
39:02
And what we want to do is we want to
39:04
make that searchable. And what
39:06
we thought we'd do is domain
39:08
verification. So you've got let's say
39:10
it's wise, for example. So
39:12
you've got to demonstrate that you control wise
39:14
.com. Now if you can demonstrate that you
39:17
control wise .com at the moment as of
39:19
today, you can go and pull back
39:21
all of the email addresses at wise .com
39:23
and the data breaches have been in. What
39:25
we want to do is
39:27
create a pwned five. which
39:29
is like a PONED -4, but
39:32
everything is double. Now, the
39:34
domain size is always unlimited. The RPM would double, so
39:36
would go from 500 requests per minute to 1 ,000
39:39
requests per minute if you want to use the
39:41
API. The cost would double,
39:43
so would go from, what are we
39:45
at, 1 ,300 something US a year
39:47
to like 2 ,600. And
39:49
then, if you have the
39:51
PONED -5, you would be able
39:53
to say, give me a list of email addresses
39:56
whose credentials are being captured vice
39:58
-dealer logs when logging onto ys .com.
40:01
So that's the TODA. You would need
40:03
a PON 5, which is going to
40:06
help pay for all the cloud, which
40:08
we just spent processing this data. And
40:10
also, the ongoing story is the cost doesn't
40:12
stop once you finish the processing because you're still
40:14
sitting on the data. So
40:18
if you're listening to this now, listen
40:20
to it later on, and you have
40:22
thoughts on this. is that reasonable and
40:24
commensurate to say create a PON5, you
40:26
need a PON5 in order to search
40:28
the email addresses entered against your domain
40:30
that end up in steel logs. My
40:33
gut feel, and this is the
40:35
gut feel with all the pricing, it's not just
40:38
gut feel, this comes back from people, is that we
40:40
are very, very bad at pricing and everything is
40:42
ridiculously cheap. And I'm quite okay with that. I
40:44
think that's actually good because we want
40:46
it to be accessible as well. But
40:49
give me your thoughts on that. I'm
40:51
drafting the blog post around that. We do actually
40:53
have a Pone 5 sitting there in the background.
40:55
It's just not visible on the Have I Been
40:57
Pone page. And we do
40:59
also have a KB article, which
41:01
effectively says if people want higher
41:04
rate limits, let us know that
41:06
we can create these. We create products in Stripe
41:08
and we create all the gubbins in Azure
41:10
API management. So that's sitting in
41:12
the background, but we'll bring it to the
41:14
foreground. It'll sit there. You can choose that.
41:16
And one of the features will be steal
41:18
the logs against your website domain. Okay,
41:22
let's see where the
41:24
comments here. Wayne
41:28
says, have you seen that
41:30
Shelly brought out switches covers for
41:32
the relays? I
41:35
know other people can't post links here. I should change that.
41:37
I don't know if I can change this somewhere. Switches
41:41
covers. Shelly switches covers. Can I just
41:43
Google that? Look, come on. Shelly
41:47
switches covers. Shelly
41:51
switches can cover control covers
41:53
and shut it. Oh, no, that's
41:55
the wrong thing Drop me
41:58
a cover on the Shelly website
42:00
Gen2 cover No, that's a
42:02
different thing. Can you drop me
42:04
a message somewhere? Who
42:07
said that? What
42:09
day is it? Where am I? Wayne,
42:11
yeah, draw a message somewhere I'd actually be
42:13
kind of interested to see that but I then
42:15
say see you say they're pretty basic James
42:19
says the reason I like those is
42:21
because they can run Open Firmware and Martin
42:23
Jerry will help him shit the Open
42:25
Firmware. Is
42:27
this about the Shelly's running
42:29
Open Firmware? If it is,
42:31
I have heard this discussion
42:33
before. I know people run, I
42:37
forget what the firmware is. I'm
42:39
just not sure what else I need
42:41
that the Shelly doesn't already do. If
42:45
you've got a good answer at what's in the firmware, let me
42:47
know. I
42:50
mean, if you said for momentary issues and momentary switches,
42:52
I mean the physical switch, I want something that shows the
42:54
current state of the switch, even if you control the
42:56
IoT. I think there's a really
42:58
interesting discussion because you're trying to marry the physical
43:00
state of the switch to the digital state of
43:02
the light. Now, if people listen to this going,
43:04
what the hell are you talking about? You
43:07
don't look like a normal rocker switch, no IoT. Let's
43:10
say when the rock is up, for
43:12
example, the light is on, and when the
43:14
rock is down, the light is off,
43:16
and you can eyeball the switch, and you
43:18
know immediately, without looking at the ceiling,
43:20
whether the light's on or off. But
43:22
if you've got a physical rocker, and then
43:25
you've got a digital toggle, then
43:27
the physical switch isn't changing when the
43:29
state of the light changes, so you
43:31
can effectively get out of phase. So
43:34
this is the reason I got momentary
43:36
switches, just push buttons. There's
43:39
a UX aspect of that I just don't
43:41
like, but unless you're physically moving the switch as
43:43
well, although digital with some sort of a
43:45
light or something, I don't like lights behind switches
43:47
a lot of the time. So
43:51
yeah, I agree with you, Matthew. you have any great ideas on
43:53
that, let me know. Josh
43:57
says, what do you mean by Pone for
43:59
a Pone 5? Is that the name you
44:01
give your products? Yeah, so just to fill
44:03
in the gaps there, if you go to
44:05
haveobimpone .com and you go to API and
44:07
you go to API key, you
44:09
will see that there are four
44:11
levels of subscription based on either the
44:13
rate at which you want to
44:15
query the API, or the size of
44:17
your domains if you're doing domain
44:19
searches. And you'll see they go PONED
44:21
1, which starts at $3 .95 a
44:23
month, or up to 4. So
44:25
5 sits on the end of that,
44:27
and it just doubles everything on
44:29
4 and adds the ability to search
44:31
stealer logs by website domain if
44:33
you can demonstrate control of the domain.
44:35
So that's it. Matthews
44:39
is a look at Hager Finesse press
44:41
mix. Yeah. So that's one of the
44:43
ones we have coming. So the Sparky
44:45
was quite keen on Hager Finesse. We've
44:48
got Clipsle, Saturn, Zen at the moment,
44:50
which are absolute ratchet. They're the worst.
44:52
I cannot express how much I dislike
44:55
these switches, even like the, the
44:57
number of faceplates on them in our
44:59
house. And we've only, oh, let's,
45:01
let's say we've got maybe a dozen
45:03
of them. The number of them
45:05
that are held on by BlueTac now.
45:07
because when you remove them, the
45:09
clips break. It's just ridiculous. And the
45:12
mechs are terrible. just, they're terrible. I'll
45:14
let you know how I got the Hager Finesse. So that's
45:16
coming probably next week. I think we'll have one of those
45:18
as a test switch in somewhere. And
45:20
then there's a couple of others as well.
45:22
One of them we are trying, which is
45:24
IOT enabled. In fact, it's a ZigBee IOT
45:27
enabled switch. So we'll see how that goes. Janice
45:30
is ESP home and test motor. Okay, test motor
45:33
is what I was thinking of. You might not
45:35
need anything extra now, but it's nice to future
45:37
-proof for stuff that can run open firmware if
45:39
they aren't supported later. Yeah.
45:45
But then I think in terms, I
45:47
actually think the future -proofing aspect is
45:49
the the abstraction of the physical
45:51
switch from the digital implementation. Because
45:55
if we had that. then
45:57
you can always change the digital implementation
45:59
later, but then you end up with
46:01
a spaghetti of wires and shellies and
46:03
stuff in your wall behind it. Wayne
46:05
wants a Troy Hunt HIP Discord server.
46:07
I don't need another thing to do.
46:10
Speaking of things to do, the
46:12
final note here is that we
46:14
are now getting very refined in
46:16
our Have I Been Pwned rebrand.
46:18
We've got a logo, which I
46:20
think is great, that's working really
46:22
well. We have a dedicated resource
46:24
rebuilding the front end for Have
46:26
I Been Pwned. I think it
46:28
looks fantastic. And I reckon
46:30
that we're maybe a couple of weeks out
46:33
from probably open sourcing the repository where
46:35
he's doing the front end work, inviting people
46:37
in to come and have a look, make
46:40
improvements. We've ended up
46:42
going down the bootstrap five route. I was asking
46:44
questions about that. I remember where I was. I was
46:46
on the plane flying back from Europe asking questions
46:48
about this. So it would have been late December. So
46:51
that's the direction we've gone. Really
46:53
happy with the way it's looking. We've also
46:55
got a proper designer. This
46:57
is what happens when you start charging money
46:59
for stuff. You've got money to spend
47:01
making it better. So we've got a proper
47:03
designer. who's who's done the
47:05
brand and is now doing things like stickers and swag
47:07
and stuff like that. So we might not have a
47:09
discord server, but we've got some other cool stuff coming.
47:13
Matthew likes the separation of switch and
47:15
Shelly. There's your repurpose switches for
47:17
another action. Yeah,
47:20
everything is a trade off that
47:22
the trade off is the the other
47:24
one was Sparky say in the
47:26
text message. The
47:28
other is his
47:31
name. The
47:33
other switch. that
47:35
we're trying, which
47:38
is an IoT enabled switch.
47:40
I can't find it. I've got
47:42
so many messages with the
47:44
Sparky here. One of
47:46
the nice things about it, even Sparky is saying,
47:48
know, rather than, let's say you've got like a
47:50
five gang switch is about the worst. You
47:53
end up with so many wires at the back
47:55
of each mech and then a shelly on each
47:57
one. And you just pull it out of the
47:59
wall and you're like, my God, like what is
48:01
going on here? If you have it
48:03
built into the switch, he was making the point
48:05
that they've got nice little, as long as the
48:07
little junction boxes are all the way, it's just
48:09
going to the top and it's neat and it's
48:11
clean and it's tidy. And yes, it's behind the
48:13
wall, but if you're changing stuff and mucking around
48:15
with it, it looks so nice.
48:17
So I'll let you know what they are.
48:19
They'll have to come next week. James
48:23
says, I do like having something like
48:25
Zigbee as a firewall from the internet.
48:27
I don't worry about firmware on those devices.
48:30
I mean, yeah, there's a lot of nice
48:32
things about ZigBee. Wayne's excited, I think
48:35
about their branding. Matthew,
48:37
I want to keep the physical the
48:39
same as the digital, but allow for
48:41
things like press and hold or double
48:43
tap. Yep, yep, understand. Now
48:45
this is, Lutron does good, but I think they are
48:47
US only inexpensive. Yeah, and I did
48:49
look at Lutron when I went through
48:51
this first round of stuff, but I think
48:53
one of the things I didn't like
48:56
about Lutron is I just didn't like the
48:58
look of them. I didn't think it
49:00
was right for our house. And then your...
49:02
you're stuck with a physical compromise in
49:04
order to do the digital thing, which just
49:06
felt a bit counterintuitive. All
49:08
right, folks, I'm gonna wrap it up there. I gotta
49:10
go to the airport. I've just got an email from a
49:12
journalist confirming a story is now live about a data
49:14
breach, which I now need to load as soon as I
49:16
get to the airport. So look out
49:19
for that one if you like spyware, little
49:21
teaser. All right, folks, see you soon.
Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More