Episode Transcript
Transcripts are displayed as originally observed. Some content, including advertisements may have changed.
Use Ctrl + F to search
0:00
Hello everyone , welcome to Season
0:02
2 , Episode 2 of the Learn
0:04
System Design Podcast , with your host
0:06
, me , benny Ketchel . This
0:09
week we are going to continue down
0:11
the primer of building a system . These
0:14
episodes are geared a little
0:17
more towards the system design interview
0:19
that you'll find at most
0:21
tech companies , especially if you're going into
0:23
a more senior candidate role
0:25
, but I also just want to make clear
0:27
that these are the exact same steps and considerations
0:30
that I try to take into account when designing
0:32
a new system , whether that's
0:34
a fresh product or something from the
0:36
ground up . If you haven't yet
0:38
, I definitely recommend listening to the last
0:41
episode , that's episode number eight
0:43
, before listening to this one , as we will
0:45
be talking a lot retrospectively
0:48
about the functional and non-functional requirements
0:50
, not just in this episode but the next
0:52
couple of episodes , because all
0:54
of these topics are so tightly coupled
0:56
together . They're very important and they
0:59
sort of reinforce what you're doing
1:01
and why you're doing it . Yeah
1:03
, it's just important to build lost in the weeds on specific
1:05
topics and
1:47
, in my personal opinion , today's topic
1:50
is not only the easiest to get caught
1:52
in but it's also the easiest
1:54
to just completely wreck an interview
1:56
. That topic , of course , is
1:59
capacity estimates . It
2:01
is almost a rite of passage
2:03
to be an engineer . Find
2:06
yourself in a 45-minute interview
2:08
system design discussing
2:11
and spending about 40 of those
2:13
minutes trying to do math , make
2:15
broad-stroke assumptions about
2:17
how much load you might have . Do I have 1.6
2:20
million daily active users ? Do I have 1.625
2:23
daily active users ? 6
2:26
million daily active users ? Do I have 1.625 daily active users ? You
2:28
know these sorts of things and it's nothing to be ashamed of . That's a part of this interview
2:30
and it's a part of the process to learn how
2:32
to actually handle these sorts of ideas
2:35
and what it actually means
2:37
to scale . But , honestly
2:39
, here's the little secret that I've learned
2:41
the answer to how
2:44
many people are going to be using the product
2:46
or how much data throughput
2:48
do you need to handle . It's
2:51
always going to be a lot , not
2:53
1.265 , not 1.5
2:56
. It's just going to be a giant number . That
2:58
is very giant and doesn't
3:00
actually help you that much , because
3:03
here's the catch it doesn't really
3:06
matter , because the answer will always
3:08
be the giant number that
3:10
makes you feel like you need to focus on it and
3:13
instead of taking all your time focusing
3:15
on the arithmetic , today
3:17
I'm going to teach you how to focus on , how
3:20
to take into consideration the size
3:22
, but in a more streamlined fashion
3:24
so that you don't get caught up
3:26
on it . Let's think
3:28
about writing an algorithm , for instance
3:30
, something you've probably
3:32
done a lot , whether it's in school
3:35
, at work or in an interview
3:37
. When you do that
3:39
, you don't try to estimate the
3:41
number of people that are going to be using your
3:43
system . You don't try and estimate how
3:46
many times a certain piece of data
3:48
will go through a loop or what
3:50
have you . You think about the worst case scenario
3:53
. You call that number
3:55
relative . In , for example
3:57
, this loop will take big O of N
3:59
in time . Complexity , right . Then
4:02
why are we designing a system so
4:05
mathematically so precise , like
4:07
how to find how long
4:09
2,435 gigabytes
4:12
of video being read from sequential
4:14
memory will take if it's on a spinning
4:16
disk ? You know these sorts of things
4:19
are important , but the
4:21
specifics aren't that important . Instead
4:24
, let's focus on the crux
4:26
of the problem our
4:29
estimations and our capacity
4:31
, not our specifics and
4:33
the arithmetic . So what then
4:35
do we estimate ? The amount
4:37
of read and write throughput in your system
4:40
, for instance , is important , and
4:42
the amount of storage your
4:44
system will need to hold . When dealing
4:46
with the numbers , just keep it in
4:49
a factor of a thousand , right
4:51
, because the more
4:53
you round , the easier
4:56
it is , and the bigger the number , the less
4:58
the specifics matter . If I
5:00
say I'm going to give you $110
5:04
, you might just tell
5:06
everyone oh yeah , ben gave me $100
5:08
, right , the 10 doesn't really
5:11
matter , just like when you're doing
5:13
an algorithm . Is it big O
5:15
of n plus 2 ? Well
5:17
then , it's just big O of n , the plus 2 doesn't actually
5:20
matter . And so what
5:22
I mean when I say stick to factors of 1,000
5:24
, matter
5:27
. And so what I mean when I say stick to factors of a thousand , simply put , it's the difference
5:29
in 1,500 , 670 people and
5:31
154,000 people . Right
5:33
, those extra 200 or so people
5:35
is not going to bankrupt your company , it's
5:38
not going to break your system , and
5:40
it's a lot easier to do that sort
5:42
of back of napkin math with
5:44
nice round numbers . So
5:47
when , then , is it important to do those
5:49
quick calculations that I speak of
5:51
? Well , we'll get to that a little
5:53
later . For now , let's make sure we have
5:56
a few things memorized . These are
5:58
the important pieces of information you
6:00
should bring into any interview or
6:02
any calculation when you want to consider
6:04
your capacity . For some of
6:06
you it might feel like a refresher or common
6:09
knowledge , but for others it may
6:11
be the first time considering it , so I
6:13
want to cover it that way regardless . We're
6:15
all on the same page going into
6:17
how to do this back of napkin math
6:20
and what's really important
6:22
about it . So the first
6:24
thing is to always remember your
6:27
scales of 1000 and how they relate
6:29
to data sizes . In
6:31
this case I'll be using bytes , but
6:33
these factors can technically
6:36
be applied to anything that is a metric
6:38
. When it comes to tech , when
6:41
we think of data sizes
6:43
, we usually describe them as bits and bytes right , but it is important to understand
6:45
the levels of data sizes . We usually describe them as bits and bytes , right , but it is important to
6:47
understand the levels of these sizes
6:50
relative to one another . For
6:52
every thousand increments , we use a different
6:55
prefix . So for
6:57
every thousand bytes , we have
6:59
a kilo , like a kilobyte . For
7:01
every 1 million bytes , which , you
7:04
might note , is 1000 squared , we
7:07
use mega , like megabyte , and
7:09
so on and so on . I'll get to the rest
7:11
in a minute . The important ones
7:13
to remember are that 1000
7:15
or less is just unit . For example
7:17
, 560 bytes
7:20
of data and if you think
7:22
about it , a thousand raised to zero
7:24
is one , right ? A thousand raised to
7:26
one is a thousand , so
7:28
that would be a kilobyte . A thousand raised
7:31
to two would be a million . A
7:33
thousand raised to three would be a
7:35
billion . A thousand raised to four would
7:38
be a trillion , right . And so the
7:40
designations for those in
7:43
order would be just a byte
7:45
, a kilobyte , which is our thousand
7:47
, a megabyte
7:50
, which is a million , a gigabyte
7:52
, which is a billion , and then
7:54
terabyte , which is
7:57
a trillion , right
8:07
. It's probably easier to do arithmetic rather than having you know nine zeros , uh
8:09
, just having a thousand raised to three , because , again , once you take the
8:11
three out , ignore it , perform your arithmetic
8:14
, then add the cubed back in
8:16
. You're getting a good idea
8:18
of the scale without a lot of
8:20
very complicated um
8:22
, you know , arithmetic going
8:24
on in actual , when you
8:26
do your capacity estimates , it shouldn't
8:29
take you longer than five
8:31
to 10 minutes of the interview . It's honestly
8:34
sometimes possible
8:36
to just say hey , interviewer
8:38
, I'm going
8:41
to skip over this for now . I know it'll
8:43
be a large number , I'll give it 1.5
8:46
based on past examples
8:49
, and sometimes the interviewer will
8:51
just say okay , yeah , no problem
8:53
, I want to know how you think , I want to know
8:55
how you would approach the problem . I don't
8:57
care whether or not you can add
8:59
a bunch of large numbers , bunch
9:06
of large numbers . But we can even go further than that , right ? So let's take
9:08
into consideration , you have 1.35 trillion bytes
9:10
, right ? It's a lot easier
9:12
to drop all those zero and just have it be 1.35
9:16
and then say well , if
9:18
we want to 3x scale this , you can multiply
9:20
1.35 times 3 , rather
9:23
than some obscure number
9:25
like 1 , 3 , 5 , 6
9:27
, 2 , 3 , 4 times 3 . Right
9:30
, one of those is going to take you significantly
9:32
less time to parse through and
9:34
do that back of napkin math if
9:36
it's necessary . The important
9:38
part is that you can take those numbers
9:40
, say , say you know , with
9:43
a reasonable doubt , this is 1.35
9:45
gigabytes or terabytes , and
9:47
know , you know the difference
9:49
in scale between those numbers . And if the
9:52
interviewer presses you on you
9:54
know what that actually is . You can tell
9:56
them oh , that's 1.35
9:58
trillion or 1.35
10:00
, you know quadrillion or what
10:03
have you Um . And it shows
10:05
that you know what you're talking about , that you know the
10:08
, the numbers , and it's not just you
10:10
know , you playing um and
10:12
that you you're doing calculations , but you're
10:14
making it a lot easier for yourself . You're
10:17
working smarter . And
10:19
speaking on this concept
10:21
of working smarter , you
10:24
know , sometimes these tests can
10:26
take on specific constraints
10:29
and sometimes you need to think about a budget
10:31
. And that honestly brings
10:33
me to my next important factor that
10:36
you know we need for understanding
10:38
the dynamics of latency
10:41
across different constraints
10:43
on our system . You will
10:45
remember latency from
10:48
the episode one and episode
10:50
three and also kind of across
10:53
our core episodes
10:56
and our non-functional
10:58
requirements from the last episode as well
11:00
. The comparisons I'm about
11:03
to give you will directly link
11:06
to the non-functional requirements
11:08
from before . So
11:10
, if you're keeping track , currently we are on
11:13
step three in this whole process
11:15
and we are already calling
11:17
back to step two . By
11:20
considering the latency and making
11:22
call-outs about specific hardware , you're
11:25
already checking this
11:27
non-functional requirement off the
11:29
list . So let's talk
11:31
about the hard numbers . Right , to
11:34
read one megabyte
11:36
from memory , the entire
11:39
process will take around a
11:41
quarter of a millisecond , which is pretty
11:44
fast . But , as you may or
11:46
may not know , memory
11:53
is temporary , so you can't just hold everything in it . You need
11:55
some sort of long-term storage , and so the next fastest thing for our
11:58
process are solid-state drives , and
12:00
to fetch the exact same
12:02
amount of data one megabyte for
12:05
an SSD that you just
12:07
fetched from memory , if
12:10
you remember , which was quarter
12:12
of a millisecond , is actually
12:14
a 4x slowdown . Yeah
12:17
, that's right . So fetching one megabyte
12:20
of data from an SSD actually
12:22
takes an entire millisecond of
12:27
data from an SSD actually takes an entire millisecond . But of course , solid-state
12:29
drives are more expensive than a more traditional spinning disk
12:31
drive . How much more expensive ? Well
12:33
, if we talked not that long ago
12:35
, it would have been $40
12:37
per gigabyte to $0.05
12:40
per gigabyte , but thanks
12:42
to innovation and beautifully minded
12:44
, wonderful people , today
12:47
it's more along the lines of $2 per
12:49
gigabyte to 5 cents per gigabyte
12:51
, which again , may not seem like
12:53
a lot , but when you get into petabytes
12:56
worth of data , it means a big
12:58
, big bill . Why
13:00
, then , do most engineers design with
13:02
SSDs in mind when we don't
13:04
have these specific constraints ? Because
13:06
fetching one megabyte of data from
13:09
a spinning disk hard drive takes
13:11
a whopping 20 milliseconds , 20
13:14
times slower than an SSD
13:16
, 80 times slower than
13:18
fetching it from memory . Hard
13:21
disks still have their place , though
13:23
they're continued use , in
13:26
the ability to utilize them as cold
13:28
storage . If your
13:30
data is not being accessed a lot or
13:33
it's not super important that it comes up immediately
13:35
, maybe you can load it in the background . Storing
13:38
data on spinning disks is perfectly fine
13:40
, often encouraged to save some
13:42
money . Honestly
13:44
, one clever way I have seen cold storage
13:47
in this way implemented is with user
13:49
data , which might seem a little
13:51
strange . But follow me , think
13:53
about a video game that's gone viral
13:55
. Everyone is playing it on day one and
13:57
they're super excited and everyone's logging in
13:59
constantly . They've created their login
14:02
, modified their character and
14:04
it's been a month and they're over
14:06
it . Some people might stick around
14:08
and when they log in they want that process
14:10
to be quick . You want to get them in game
14:12
as quickly as possible . Again
14:15
, see the Amazon reference from episode
14:17
one and how fast you lose money when
14:19
things are slow . But if you're someone
14:21
who hasn't logged on in a while maybe
14:24
over a year then you're a little
14:26
more patient with logging in . You have no point
14:28
of reference for how long it should take
14:30
to be logged in and you know it's
14:33
sort of . You have a little
14:35
bit more flexibility in that sense and
14:37
so what you can do is on the
14:39
back end you can have like a cron job
14:41
that checks for the last login
14:44
time for a user and if it's
14:46
been over say , a month , move
14:48
that user data from the SSD
14:51
database to the hard drive database
14:53
. If , for
14:55
whatever reason , they log back on , you
14:58
just move that data back to the SSD
15:00
and if they never
15:02
log on again , no worries , you
15:04
aren't being charged a ton of money to store
15:06
it and the data is always there
15:08
. The
15:10
next piece of information I want you to sort of memorize
15:13
is the rough size of data in
15:15
a storage capacity . Have
15:18
you ever thought about how much space
15:20
the things you interact with on a daily basis
15:22
take up ? Let's talk
15:24
about like a company like netflix . They
15:27
get roughly 100 million videos
15:29
streamed a day . That
15:32
in itself is a gigantic number
15:34
, even if you're just talking about pictures
15:36
. But you know , netflix of
15:38
course works with video and
15:41
the rough size of a two-hour movie
15:43
on average is about one to two
15:45
gigabytes , and that's not 4k
15:47
. If we're talking about 4k
15:49
high-res movies , we're looking
15:51
somewhere in 10 to 20 gigabytes
15:54
apiece . And again , as
15:56
we talked about before , the number here doesn't
15:58
technically matter . Netflix
16:00
simply deals with a lot of data
16:02
. So having these rough estimates
16:05
are handy when trying to think intelligently
16:07
about the amount of data you're dealing
16:10
with . So if
16:12
a two-hour movie is one
16:14
to two gigabytes , if
16:16
you think about a small book worth of text
16:18
or a high-res photo , you're
16:20
looking more around a megabyte
16:22
, whereas a medium-resolution photo you can get as around like a megabyte
16:24
, whereas like a medium resolution photo , you can get as small as like 100
16:27
kilobytes . So
16:29
it's safe to say if you're building something
16:31
like Netflix versus building something
16:33
like Instagram , for instance , you're
16:35
going to approach it in a different
16:37
way . And
16:40
for our final piece of information
16:42
that I want you to remember , I want to talk
16:44
about the rough sizes of the company's
16:46
operations . This is to help
16:48
you memorize the scale at which you'll
16:50
need to think about the data being processed
16:53
, that sort of throughput , for
16:56
instance , designing a system that will handle
16:58
the same load as like a social
17:00
media network . You're looking around
17:03
a billion daily active
17:05
users . We talked
17:07
about Netflix and how they stream 100 million
17:09
videos a day . That's very important as well
17:12
. And Google it fights off
17:14
around 100,000 queries
17:16
per second , and
17:19
building an app like Wikipedia means
17:22
storing data somewhere in the neighborhood
17:24
of like 100 gigabytes if it's uncompressed
17:27
. So try to remember these
17:29
numbers so that if you go into an
17:31
interview and they say , design Netflix
17:33
or design Wikipedia
17:35
, design Google you know these very
17:37
common questions that you get . You
17:40
can sort of already know okay , I know
17:42
not only how much
17:44
you know . I'm going to need to handle like
17:47
a billion daily active users
17:49
, but also you know
17:51
a billion daily active users that will be
17:53
uploading , you
17:55
know , possibly a
17:58
megabytes worth of photos , but
18:00
also they're going to be reading a
18:02
megabyte's worth of photos , or
18:04
more If
18:06
they have a feed . You're going to be having
18:09
a lot of their friends and each
18:11
one of those friends has a one megabyte photo
18:13
, and so now you're sending a
18:16
billion daily active users photos
18:18
times the number
18:21
of friends they might have on average . But
18:26
again , these sort of quick maths , right , it's a lot easier to say
18:28
, well , one billion , okay , well
18:30
, that's just one
18:33
. And so if
18:35
it's one megabyte , okay , then roughly
18:37
one megabyte times a billion people
18:39
. We're looking in the neighborhood of
18:41
a terabyte worth of data
18:44
, you know , flooding through our
18:46
system . And okay , now then
18:48
, is it a read system or is it a write system
18:50
? Well , more people are doing reading on
18:53
Instagram and looking at pictures than
18:55
they are uploading . So , okay , I need
18:57
to focus on making sure I can handle that
18:59
throughput of having a terabyte worth of
19:01
data going out a
19:03
day and being read
19:05
from my system . And so
19:08
, finally , I want to give
19:10
you a reminder about the common mistakes
19:12
to avoid how to approach this
19:14
step when designing a system . If
19:17
you're taking an interview , you might be able
19:19
to brush past this step a bit
19:21
. As said , saying something
19:23
like the system will share similarities
19:25
to netflix . So I want to consider around
19:27
100 million videos at two gigabytes
19:29
a piece . Sometimes that's enough for the interview
19:31
, but they might push for a little extra
19:34
math and you want to give it some thought and
19:36
do those calculations like I just did
19:38
. Uh with like a social media , like
19:40
an instagram , but avoid trying
19:42
to calculate , like how many hard drives
19:44
it will take to hold it , or getting
19:47
too low levels with the numbers and
19:49
getting too specific with the numbers you're using
19:51
. On the other hand , if
19:54
you are designing a system in a real world
19:56
scenario , your capacity considerations
19:58
are important . You should consider
20:01
the cost of hardware . You should
20:03
consider the price of
20:05
hardware . You should consider using
20:07
an SSD versus an HDD
20:09
or both . Sometimes
20:11
using a spinning
20:14
disk drive is a lot cheaper . Sometimes
20:16
it's easier to host your own servers
20:18
than to use the cloud . Sometimes
20:21
it's a lot more expensive . It just
20:23
depends on your situation . Regardless
20:26
, remember to focus on the
20:28
crux of the problem . If this
20:30
is primarily a video-based service
20:32
, like Netflix , you don't need
20:34
to worry about calculating the size of
20:36
text for descriptions or the
20:39
size of the avatar
20:41
for the user and what that would
20:43
mean right . Focus on the big thing the videos
20:46
, the users consuming those videos
20:48
, how they're consuming them , and
20:50
work from there . At
20:53
the end of the day , the elements of
20:55
capacity estimates that
20:57
are always good to remember are your core facts
21:00
. Think about your numbers and you
21:02
know factors of a thousand . Keep things
21:04
high level and focus
21:06
on what your system should be doing , not
21:09
necessarily how many specific
21:12
times it should be doing it . It
21:14
will almost always be impossible
21:16
to take every little thing into consideration
21:19
, especially in an interview , so
21:22
just try and focus on the crux of the
21:24
problem , not all the little things that
21:26
might pop up . It
21:30
is perfectly fine to make small mistakes . No one is judging you on your ability
21:32
to multiply a couple of numbers
21:34
together . Instead , I want
21:36
to know that you have a good
21:38
idea of scale and a rough size
21:40
of the data that's being implemented , and that
21:42
you can take that into consideration
21:45
. From there , we can start talking
21:47
about how to scale the system , how to work with
21:49
it , et cetera , et cetera . Next
21:52
episode , we're going to be focused on steps
21:54
four and five . We'll be talking about
21:56
DB and API design
21:58
. These are very
22:00
important things . It helps you flesh out your
22:02
models and help understand what
22:05
an API will look like and that
22:07
you know how the data will be flowing through your
22:09
system . I want to give a special
22:11
thank you for everyone that reached
22:13
out . Of course , antonio Lettieri
22:15
, you had a great call out on our load
22:17
balancers episode , which I greatly appreciate
22:20
. Gamesply and BeerX
22:22
you guys have been killing it on the discord
22:25
, just making everyone feel welcome , and , and
22:27
I greatly appreciate that , we
22:29
also got a couple pieces of fan mail . Um
22:32
, unfortunately I can't reply and
22:34
I can't see your name . Uh
22:36
, so to the wonderful person in del
22:38
mar , california , and the other
22:40
wonderful person in the united Kingdom Thank
22:43
you so so much for the feedback and thank
22:45
you for the fan mail . And
22:47
, yeah , if you want to have
22:50
a more specific shout-out , feel free to send
22:52
me an email . And finally
22:54
, the biggest thank yous to everyone
22:56
on the Patreon Jake
23:09
Mooney , charles Cazals , eduardo Muth-Martinez
23:11
. Thank you so so much for
23:14
everything for supporting us on Patreon . Later
23:17
this month , I'm hoping to release a special episode
23:20
just for everyone on Patreon
23:22
. You guys are still getting the episodes a week early
23:24
, but I also want to do a special episode
23:26
just for you , focusing on authentication
23:28
. We did have a couple of people
23:30
vote on the poll asking for
23:33
an authentication episode , so I
23:35
want to do that special episode and eventually
23:37
I will release it on the main channel
23:39
, but it might not be for a month
23:41
or so , so they're getting a special
23:43
thing over there . So very much appreciated to
23:46
all of you . I will also be posting
23:48
a new poll very soon on Patreon , probably
23:50
around the time this goes out , asking
23:53
for what specific
23:55
interviews you guys want
23:57
me to start tackling . This has all just
23:59
been a primer , but I actually want
24:01
to talk about specific interviews , how
24:03
I would approach them , do some
24:05
research on the best ways to approach them
24:08
and sort of flesh that out for you guys
24:10
. So , definitely
24:12
, please go to the Patreon
24:15
, become a supporter if you have the
24:17
means , if you just
24:19
enjoy listening . It would mean the world
24:21
to me if you just provided some feedback
24:23
and send an email , tell
24:25
a friend , anything like that . It
24:28
all means the world to me , so I very much appreciate
24:30
it . If you would like to suggest
24:32
specific topics that you want
24:34
me to jump on , feel
24:37
free to drop me an email at learnsystemdesignpod
24:39
at gmailcom . Remember
24:42
to include your name if you'd like a shout out , if
24:44
you would like to help support the podcast , help me
24:46
pay my bills again . Please jump SoundCloudcom
24:49
. Slash aimless orbiter music and , with all
24:52
that being said , this has been Thank
24:54
you .
Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More