Passing Messages

Passing Messages

Released Friday, 14th February 2025
Good episode? Give it some love!
Passing Messages

Passing Messages

Passing Messages

Passing Messages

Friday, 14th February 2025
Good episode? Give it some love!
Rate Episode

Episode Transcript

Transcripts are displayed as originally observed. Some content, including advertisements may have changed.

Use Ctrl + F to search

0:20

Hey, Ben.

0:22

Hey Matt, dancing to the theme music.

0:23

How are you doing, friend? We both are. I know. and is yeah Again, we've said this before, but like we've the new the new recording setup means that we get to play the intro music, if only just to get the timings roughly right.

0:34

Mm hmm.

0:34

And so that we are bopping away. And actually, thinking of that, in the background, not that our listener can see, because we never record the video to this, but there's a huge box of nonsense behind me, if you can see it over here.

0:46

Mm hmm.

0:47

So I'm traveling at the moment. I'm at my my parents' house in England, ah which makes this an international recording.

0:54

And I've been mining all of the stuff that I left behind as a kid to send either home, which was actually kind of a miracle thing. I had two large boxes of like cool stuff ah of mine that I'd left here because you know I wasn't planning on being in the States for more than a couple of years, but 14 years on I'm like, well, maybe I should take this home now.

0:54

Yep.

1:12

Right.

1:14

I just put, I stuck a sticker on the side of these boxes and I took them to a local shop.

1:14

Yeah.

1:19

Magic happened and then my wife texted me a picture of them on my front porch in America. It's like, I know that's how that works, but I was still so excited to see my things get there in like a day and a half.

1:30

It was brilliant. um But now I'm sending all of this stuff off to the person who composed our theme music, Inverse Phase, Brendan Becker, um for his museum.

1:38

nice.

1:41

In fact, he has got great news. He has just um raised enough money to buy his own property and he's moving his museum ah of cool like games and weird and wonderful computer tech paraphernalia over the years ah into Pittsburgh.

1:58

So at some point next year, I will definitely be going to Pittsburgh to see him. and the amazing things. Anyway, weirdest thing. um What are we talking about today, Ben?

2:09

Today, in addition to that, in addition to cool museums, we are talking about

2:15

We are talking about messaging systems.

2:23

There are various systems in the world that you design as ah some service somewhere is going to send you a reasonably small message and you're going to process that message and then you're going to send

2:37

Mm hmm.

2:40

one or maybe more or maybe zero messages out to another thing. And the whole architecture of the system is designed with just sort of this message passing in mind. um And oftentimes when you have systems like this, you have distributed computing problems.

2:54

Yep.

2:56

You have ah sort of reproducibility ah concerns that you need to think about. And so I thought it would be a good idea to talk about some of the things in our experience, having built some of systems like this.

3:06

Right.

3:09

Yep.

3:09

And we can talk about maybe what some of those systems were just for context.

3:11

But um in our experience building systems like this, what are some things that you should do? And what are some things that you should definitely not do?

3:20

Interesting, yes. um So we're talking about, you you mentioned small message there, so we're not talking like bulk data thing here, we're talking, what what would be an example of, what would be like a canonical example of this kind of system

3:33

Yeah. Well, I think starting right off with some of the things that you should not do is I don't think that you should put gigabytes of data into something and call it a message.

3:38

Okay.

3:44

Um, that's it in these, that is something that I would be skeptical of if, if someone was like, well, can't we just take this like, you know, three gigabyte file and stick it in there?

3:46

I mean, strictly speaking, you can, but you mean in these kinds of systems, you wouldn't want to...

3:58

It's like, maybe you shouldn't do that.

3:58

So, and again, message, positive so we're talking things like broadly something which could be using ah something like Kafka as a sort of mental model of like, Hey, you're going to just put a sequence of messages somewhere and, or it could be um some other system, but I'm just putting Kafka in my head for now. It's like something that probably most of the audience space might have heard of.

4:16

Yeah.

4:20

and then obviously that's a great example of something you shouldn't do is putting a massive massive massive message into a message queue system. They're usually not good at larger pieces of data.

4:30

Mm hmm.

4:32

um it Sometimes your recipients will um want to discard some messages and if you curse them to download a three gigabyte file just to discover they don't want it that's not what you wanted.

4:46

Mm hmm.

4:46

That's not good behavior. So, you know, the typical solution I can think of of that is that you normally put bulk data somewhere else, be it like, say, an S3 bucket, or some shared file system, or some other system.

4:57

And then you send a message that says, Hey, there exists some big data over somewhere else that you can get hold of. Is that the kind of thing?

5:03

Yeah, yes, that's what I've done as well as you have basically a pointer to some other large piece of data, whether it's a file in object storage or maybe even like you know, one thing I've seen is like embedding a SQL query that's, um you know, bitemporal.

5:18

So when you run it, you always get the same results. You can put that in the message and be like, oh, there's some data available here if you want to query it, right? um but

5:24

Oh, that's neat.

5:25

But like embedding... the core idea here is that don't don't put a bunch of data into a messaging system, whether that's just a system that's passing messages or a queue, right like something like Kafka or some other type of queue.

5:36

Yep.

5:38

Instead, put in something that allows you to fetch the the consumers of that stream to fetch that data if and when they want based on maybe some metadata that you include in the message.

5:42

Got it. Yep. Now you've obviously by doing that you have added another system to an otherwise straightforward system like I would need to mock out if I was testing this both the retrieval system separately from the message queue system.

6:01

There's an allure to saying, Hey, let's just throw it in the one system and then everything's a message and we don't need anything outside.

6:06

So I can see why, but, but there is a blurry line, like, you know, we throw out three gig of data is like, that's too much, but, but maybe, you know, 300K, I don't know, maybe, uh, yeah.

6:06

Mhm.

6:11

Yeah. Yeah, yeah, yeah. Yeah, right. It's like that maybe is fine. Yeah.

6:19

So yeah, so all of that I think starts to become a lot more context sensitive. And maybe it's worthwhile talking about like some of the systems that we have built to paint a little picture of some of this context and be able to talk about ah the trade-offs that we're talking about here in those contexts.

6:37

Yeah. Okay. So yeah. What, what, what kind of systems have we, what do you want to start with? What would you happy talking about first?

6:43

Well, I mean, I can kind of go, you know, the the three main systems that I think of that I built that are like this are um there is a sort of infrastructure and monitoring system that I built at a trading firm.

6:58

And then at that same trading firm, I actually worked on, ah yes, that is like pantomiming the logo of the of the system that we built. And at that same firm, I actually also built a trading system for event trading.

7:18

So this is like discrete events that are happening in the world. And we would name news as an example of that. And we would we would trade those events

7:25

Right. So election results come in kind of thing and you're like, Hey, if, if this person wins and the market moves this way, or, you know, if some, if a drug gets [approved]...

7:32

Yeah, tweets, we would trade tweets. I mean, things like that, you know, ah press releases, those kinds of things.

7:40

Right.

7:40

And that was extremely latency sensitive, right?

7:43

Like that trade is basically like you're, you're racing the speed of light. um And so that had its own special constraints.

7:47

Right, because you and everybody else know that if the Fed puts the the interest rate up, then the market will react in a particular way and you want to either take advantage of it or, you know, protect your own position, whatever.

7:54

Exactly.

7:59

But yeah, interesting.

7:59

Yeah, yeah, yeah. um So like you know in that example, like a queue is just right out.

8:01

Yeah.

8:05

You can't queue anything.

8:06

ha

8:06

right like That's not going to work. um And then probably the third one was ah the system that ah we collectively built at Coinbase.

8:18

I was thinking about that one.

8:20

which ah was an exchange, right? Like Coinbase hired Matt and I and a few other people to build a replacement for their cloud-based exchange. And what happened with that is a big long story, which is maybe another podcast, but nonetheless...

8:34

Or not....

8:38

We yeah, right, or not, honestly. You can read about it on the internet if you want. How about that? I think that's the best way to to to do that.

8:42

Yes.

8:44

But nonetheless, we we built an exchange. And that is very much a system like this where you're passing messages around. So those those are the three that sort of spring to mind for me.

8:50

Right. And just concretely for those, you know, an exchange in this instance and is is a a service where many people are sending messages into the system to buy and sell a commodity, in this instance, various cryptocurrency coins and things. um And yeah, we had to process those and we had to process them fairly and we had to process them ah as the lowest latency that was reasonable and very, very, very reliably. And yeah, we used a very interesting design of a messaging system at the very core, the very guts of how it all fitted together to give us certain properties that we wanted to be able to tell our clients that we had, you know, like fairness and guarantees over certain things, which was very interesting. Yeah, no, those are cool. Where do you want to start? Do you want to start with the monitoring system or

9:38

ah Well, those are those are mine. Are there any others that you can kind of throw into the mix here?

9:44

I mean, I think, in general, receiving market data itself, that is the the information that exchanges then that the exhaust from an exchange. So um the publicly visible information for some definition of public about what's going on in any particular market is disseminated as a set of discrete messages that is, is ordained to you, you get a PDF from the exchange, and they say, this is how we're going to do it. But you have to be able to sort of keep up and read and process that. So you get ah yeah There is a message processing system there so that's the thing i have the most but experience with but i don't get the choice of designing it i just got to make sure i hit the spec of the of of what's going on there so i don't think of them i don't think of that as in in the same way as as the other thing so let's just stick with your your and i'll see if anything rings a bell with something that i have done.

10:33

Okay. um But yeah, so examples of things to do and not do. So you know in the in the sort of latency-constrained world, that I was living in with that event trade, and I would imagine in other places where you have latency constraints, you need to be very careful about the messages at rest, right? So ah a more dysfunctional form of this, I think, is you're building a messaging system, but in the middle of your messaging system, you put a database.

11:02

So you write data into the database, and then you have some other thing that is pulling data out of the database.

11:03

Right.

11:10

And it's like maybe got like a cursor or something where it's like, you know, I'm at like row 1000.

11:14

Right. You're tailing a log, effectively. That log just lives in a database and you've got, yeah, you're just following down in insert on one side and a select the next thing on the other.

11:23

Yeah, yeah, yeah. and the And the terrible thing about those designs is that they they kind of sort of mostly work a little bit, right?

11:25

Right.

11:30

So it's easy to trick yourself into thinking that you have something that will scale and you're like, oh yeah, you know, this database scales, I don't know, whatever, it's some cloud database and it scales infinitely, right?

11:41

Or I've got, you know, some cluster of these things and I can just scale it out horizontally.

11:42

Right, right, right.

11:44

But like, you know There's not really any magic there. If you've just got one table and you're writing things into the one table and you have lots of things reading from the one table, you need to really understand what that database is intended to do and what it's capable of doing and maybe ask the question in that case, you know do we need something more like Kafka?

12:05

Do we need something more that is more of a traditional queue?

12:06

Right, because you're, I mean, ah not to throw anything in your way, but no, a good friend of mine once suggested that using a sequence of numbered files is a perfectly reasonable way of sending messages between systems.

12:19

And that's true as well. So I don't think you're saying that a database is not a solution to some problems, but certainly when latency is important, you've got too much non-determinism and there's too many moving parts.

12:20

Yeah, yeah. Right, yes.

12:31

So what do you do if you have um a latency sensitive application that needs to be able to react as fast as you possibly can, and you still want it to be a message passing system.

12:38

Mm-hmm.

12:43

ah ha okay i mean so you know Again, we're calling on some of our prior experiences here. um Not storing the messages, right like having the sender and the receiver directly sending messages to each other, either over ah you know TCP or some sort of reliable multicast protocol, which you know you can Google various options there and see what you like.

13:05

I was going to say there's, there's, that's a whole episode.

13:07

Yeah, right, um is a great way to sort of reduce that latency. It does put constraints on the consumers, depending on exactly how you do it, to either not create back pressure or to deal with that back pressure in some way.

13:14

Yep.

13:21

Like, you know, the fundamental question to ask is if the consumer doesn't consume the data, what happens? Right?

13:27

Yep.

13:28

where Where does it live? Does it get dropped? Does it get stuffed somewhere else that it reads later? And how would it ever possibly catch up? So there's all sorts of concerns to think about there. But fundamentally, if you've got something where you've got some latency constraint, I think... Attacking that problem as I'm going to write my messages into some sort of storage thingy and then read them back out again.

13:51

You just need to be really careful about what kind of latency that's going to introduce and maybe just going directly is better.

13:57

Right.

14:00

Right, and I suppose in the limit, um if you can do this, which obviously we've we've kind of glossed over already, um being on the same physical computer means that you can use shared memory transport type things and a queue that that lives only in memory.

14:08

Mm hmm. Mm hmm.

14:13

So there is there's a queue, but like only because you have to have somewhere to put it, you know, so a double buffer or even in the limit of like, I'm writing to this thing ah in process A and process B is just waiting for the okay to read it read from it as soon as it's been finished, as as soon as it's finished being written to.

14:29

um But, you know, in all the things that I've been to thinking about so far have all been some network traffic has happened between a more distributed system than than something that can be literally co-located.

14:40

Because, of course, and even more of a limiting case, they're in the same thread and they just literally have memory mapped and in this, you know, they're just ah a global variable is being said or whatever a shared variable, I should say.

14:45

Mm hmm.

14:50

um Yeah, so um storing the data is is sort of orthogonal to, or sorry, durability of the data. You don't always need durability.

15:01

Something like Kafka will always give you durability. And as you say, that's the thing that stores it kind of first, and then everybody gets a copy of it from the brokers that have already stored it.

15:08

There's a quorum based here, and everyone's got you know it. it is We know that if a message has been sent, if before anyone sees it, some configurable amount of durability has taken place such that you know that that message has not been lost.

15:08

Mm hmm.

15:21

And we'll definitely be there again if you have to go back and get it. And then there's something on the back end as well where you can say, I know that this message definitely got processed by at least one of the people that were supposed to do anything with this message. And so that's really, really good when you're talking about things like financial transactions and other things where you like, it absolutely needs to happen.

15:39

We need to have a journal of record. And that journal is is more important than the the latency hit we have.

15:40

Mm-hmm. Yeah.

15:44

In the case of your event trade, presumably,

15:46

Mm-hmm.

15:47

if you dropped a message or if they're, again, back-pressure related things here, maybe dropping the message is okay, because it's better to not hold up the fast people by having that one slower consumer than it is and have that message being missed by that consumer than it is to cause them ato potentially to to to fire an order too late or some other and some issue there, right.

15:53

Mm hmm.

16:09

Yeah. yeah yeah yeah and Another actually interesting dimension of that particular system, um which I think is worth talking about, is that the messages were were not sequenced.

16:20

We had lots of different messages coming in from different data centers.

16:21

Interesting.

16:25

that were all hitting the same system. And it didn't really matter what sequence they arrived in, right? this The system could could deal with that in different ways.

16:30

ah Oh, that is interesting. Yeah.

16:33

But oftentimes, it is very useful to be able to sequence a stream of messages because that allows you to do things like create a state machine

16:37

Yes.

16:41

And then any consumer of that stream should be able to reproduce the same state of the state machine from the sequence of events. And obviously, and a classic example of this in finance is building a book. But there are lots of situations in which you want to have a sequenced stream of events that you can use to reproduce state in any consumer that sees that stream.

17:01

Right, this is like log structured journals of information like databases and things, you just need to be able to process them in strict sequence. Now, and again, when you, that's okay.

17:07

Mm-hmm.

17:09

So like you mentioned building building a book in in our world, which is taking this multicast data that flows from the exchange and applying it um as the set of modifications to an empty state to bring your world up to date with whatever orders are flying around and are currently active.

17:27

And you absolutely have to apply them in the right sequence or else things go horribly wrong. But in that instance, there is a single producer, at least for any one book, there is exactly one producer that is can give you a sequenced number.

17:41

And therefore you can see if the messages arrive in order. And so that's That's an easier proposition. And again, for those folks who are thinking like TCP, again, if you've got a single connection that's TCP one end to the other, then again, the the the messages that are being sent aren't going to be reordered anyway, that's a property of the of the transport. But in general, for the kind of UDP messages that we talk about in finance, that's not true. And you need to be able to see if you either have received messages out of order, or you've seen that you in fact miss one that you need to go and get it from some other ah other place.

18:14

So that's an interesting property, again, of messages. So we've already talked about all durability is one sort of dimension. Another dimension is like, what are the constraints on ah reproducibility and sequencing that kind of sort of go hand-in-hand?

18:27

um So just to sort of to take another point here, that something like Kafka, by putting it through a broker, somebody who's responsible, at least for a single stream in Kafka, you have also, as well as the durability guarantees, you have got like a single place of record where the ordering is kind of set in stone.

18:37

right

18:47

And so a subsequent read of that will give you back the things in the same order that everyone saw it in. And that's a useful property in some cases. But going back to your event trade, you are saying that that's something that you could actually tolerate.

18:59

And in fact, you didn't want to take the hit for ah receiving from multiple, multiple systems, right.

19:02

Right. Right.

19:04

Yeah, the sequencing process would just slow that down so we couldn't do it, right? It's you have to to just design the system to to be tolerant of that. But I think something that's really important to understand, and this is true of Kafka. It's it's this might be just like a general CAP theorem thing of like if you're going to get a sequence stream of events.

19:23

then it can be very difficult to build a system that can scale horizontally with that constraint. Because something has to be that you know the sequencer.

19:30

right the arbiter of what time things happen which came first right yeah there's yeah

19:34

Which came first? Yeah. ah huh ah huh so and And in the particular case of Kafka, I forget topic versus stream and and and exactly how that is. But it's like the thing that gives you that ordering guarantee cannot scale horizontally.

19:51

that is yes the stream within a topic so topics can have multiple streams and those streams are kind of a unit by which they are um given to individual members of the Kafka cluster and of course you can have multiple processes and threads and whatever so essentially by sending to a single stream you're sending to a single

19:51

right Yeah.

20:07

...single destination, and that's the thing that gets to decide, but there's only one of them. If you need if you need to go faster, you need two of them, and now suddenly you're no longer, do you have this nice guarantee of a total ordering.

20:21

And that's what we're talking about here, a total ordering.

20:21

right Yeah, yeah, yeah, yeah, yeah. So there's some important trade-offs to consider there.

20:28

So why not just use the time as the total ordering?

20:33

[Laughs] Well, how much time do you have? Because, pun intended.

20:36

Uh, well, you said you, you said you had an hour, so, uh, I'm taking you at your word.

20:38

um Well, so to start with, um what precision? ah because whatever precision you choose, you're going to get some amount of collision, right?

20:47

All of the precision.

20:51

These two events happened at the same nanosecond. Which comes first? I don't know, right?

20:56

I mean, ah yeah, yeah, no, exactly.

20:59

Right, like that's not a deterministic sort order, right?

21:03

And if the, if you look, yeah, you think that never happens and then, you know, that's, that doesn't, what you know, birthday paradox kind of thing means that it happens a little bit more often than you would otherwise naively think.

21:14

But yeah, it's still, I, I, I'm going to admit here. Um, we did use nanoseconds since 1970 as a, like a global key for packets arriving in one of the products I worked on a number of companies ago.

21:26

Mmhm.

21:28

And the solution there was a post process, arbitrarily picked one of them if it found two that had the same and just added one nanosecond until it didn't till it didn't match anymore, right?

21:35

Yeah, right, right, right, right, right.

21:37

It's like, it's pragmatically, it mostly never happens. But what it does, it really blows your system up. So yeah, and then so how much precision?

21:43

Yes, it's.

21:45

Great question. And you know, you and I have been fortunate enough to work in the the finance industry where we already like to have accurate time. So getting a somewhat accurate to within low digits of nanoseconds time is is feasible for us, but for most people that isn't an option you can get milliseconds at best and ntp will get you within plus or minus fifteen maybe twenty milliseconds you know better than two people synchronizing their watching and watching an old spy movie but not that much better.

22:05

Right, right.

22:14

Yeah, yeah, yeah. And I do think it's sort of that false precision problem that leads you into this trap where you're just like, well, this nanosecond precision timestamp, what are the odds, like they can't even physically arrive at the same time.

22:26

Like the photons don't move like that. It's like, okay, but then what happens when your clocks are just off, right? Like you're just, they're just not that precise. And so you get two things that have the same timestamp because your clocks just aren't that precise.

22:34

and right and you know when as soon as you have more than one cable the photons don't move that way but you can have two parallel streams of photons that do arrive at exactly the same time and so you do it can it can and does happen yeah so yeah you can't just use time and anyway whose time are we talking about because ah you know yes

22:55

Right, right. Now we're getting into the whole problem. This is a whole other category of this, which is clock domains, right? Like synchronizing time between multiple computers is hard.

23:09

it requires thought and oftentimes specialized equipment. And if you just sort of take it for granted that all clocks everywhere are the same, you're you're setting yourself up for a lot of hurt, like the the hurt is coming for you.

23:23

Right.

23:23

um So anytime that you're gonna be comparing time, ah you need to be thinking about what is the source of those clocks and how precise are they and how accurate are they and how are you gonna deal with the the differences between them and what are those differences?

23:39

What can they be and you know what are the the things there? So it can go all the way from, you know, we've got a GPS antenna that's sitting on the top of the building. And we know the precise geographic coordinates of that antenna. And we know how long the cable is from that antenna to all of the various servers that are using that antenna to synchronize their time. And from the length of those cables, we can compute the drift from the received signal and the antenna to each of the individual computers, right?

24:08

And unless you're taking that level of precaution or something kind of like it, I would not trust any nanosecond timestamp to be greater or less than anything else, right?

24:13

You've missed out even some bits there. you know like When we were doing stuff at previous companies, you know there would be a rubidium-based oscillator with a very high...

24:25

Yeah.

24:26

you know There's an oven that's got like rubidium at some temperature and it's used and that's the thing that you synchronize with the GPS and everything synchronizes to that with some complicated protocol and

24:32

yeah

24:35

ye Yep,

24:35

Yeah, well, no, I say it complicated. This is my favorite protocol. And I remember one of our network engineers saying to me, yeah, we use PPS to synchronize the master clock with the individual, like clocks on each of the machines. I'm like, PPS, wow, what's that? Because I've heard of NTP, and I've heard of PTP, and PPS. And he's like, it stands for pulse per second.

25:00

And it's like, literally, it goes five volts once a cycle a second, on the second, and like, oh, right, that's the protocol.

25:04

Yeah, yeah, yeah.

25:07

Just on and off, got it.

25:09

This is good. It's a simple protocol.

25:10

It's a simple protocol. But yeah, again, you talk about the lead, you know, the cables were very carefully measured and very carefully designed to be understandable how long they the delays they brought in.

25:21

So yeah, it's complicated.

25:21

Yeah, yeah, yeah. Right, right.

25:23

And And reasonable people could disagree because yeah, you can have a data center full of things that uses your discipline for clock synchronization, which you're maybe happy with.

25:25

Oh, yeah.

25:33

But if you take a message from, say, an exchange and the exchange says, hey, this happened at this point in time, you have to trust their ability to manage that if you want to say, well, ah why don't we use their clocks, they're, you know, whatever we're doing on our side, forget it. Let's just use the clocks from the remote people. We have been through this process. You're like, well, that makes sense. You know, they surely um have done something sane. And then of course, what if they haven't? I mean, what would ever throw aspersions that are friends who have a difficult job maintaining these systems, but like,

26:02

Yeah.

26:03

Things have gone wrong before and then suddenly you're thrown into a world of of of hurt because time went backwards by tens of nanoseconds and you're like, no, I always expect time to go forwards because you know, that's one of the few truths along with taxes and death is like time goes forwards.

26:19

Nope, you think it does. But I mean, I think that raises a really good point, which is one way that you can get around this time synchronization difficulty is to never use the system time of the computers that are in the the messaging system and embed time in the messages, right? And then these the ultimate source of the messages is the thing that has to have a reasonably accurate time.

26:44

but the sense of time for all of the downstream system just comes from that. And that is really important if you want to do what we were kind of talking about earlier where you have a sequence of messages and you're trying to reconstitute state based on that sequence of messages.

26:58

If there's any sort of time processing that has to happen, then embedding the time in the messages allows you to reconstitute that state retroactively, right?

27:09

So you can go back and you can replay the messages from three months ago

27:09

Yep.

27:13

and reconstitute whatever state that you have, even if it depends on time, because it doesn't depend on the the clock of the computer that's just running the the simulation or the reproduction, it's extracting that time from the messages itself.

27:20

Right.

27:25

So you will always get exactly the same result.

27:28

Yeah, just to take a a temporary diversion here, this is one of the things that in the code base that I was working on, um we use different types for the different types of time. So they were literally not comparable or convertible between each other without like an explicit thing I could search for in the code saying like, we're doing this, we're crossing clock domains right now. I am trying to look at the current time as measured by whatever process has given me the time on my computer and I'm comparing it to the message time that was embedded in the message through some mechanism and i have to know that that comes with this huge bag of caveats. It's sometimes useful to do it because one thing you might wanna do is measure the skew between the two just to graph it somewhere or just to keep track of it or just to alert if it gets more than a few hundred milliseconds or something out. So you do want to be able to do it, but you definitely don't want to be able to do it just by saying `time t = clock.now - message.time`.

28:22

Yeah, yeah.

28:23

It should be, no, that's so so that's a syntax error, right? The thing is going to fail to compile there. You have to do some work here. And you know that's um That's ah always been a worthwhile thing I've found to do.

28:35

And even within a a computer, you know like there are different clocks. You've got monotonic clocks that are guaranteed to not go backwards. You've got clocks that try and like adjust because of like the NTP drift as they're readjusting themselves.

28:46

You've got like the CPU cycle counter, which is measured in its own domain.

28:50

Mmhmm.

28:51

So this is something that's useful to have more generally. Gosh, this is really going off topic, isn't it? This is great. But no, it's it's a really important thing to to know about. I think it's worth saying as well just because it's cool that it is possible to get networking hardware to add a timestamp onto the end of packets that flow through it.

29:12

Mmhmm.

29:12

So there are certain switches that you can configure.

29:13

Mmhmm.

29:15

You can plug them into this PPS and get them to synchronize with your very accurate timestamp. And then every message that flows through that switch gets a payload on the back of each packet tacked on after like the end of what would normally be the UDP packet or the TCP packet or whatever and you need to use exotic mechanisms to go and actually pull those bytes out but they are there and then that you can have like a source of truth that maybe the edge of your network as things come in from the outside world you say well this is where we're going to timestamp it. And that's useful for both reconstituting the sequence in which they arrived at the edge, which is not necessarily the order that they arrived at you, because cables can vary within the system and routes within your system can vary, but it gives you something to measure things by. And in particular, when you're doing some of those more ah latency sensitive things that we were talking about, having a sort of ground truth comparison, that you can look at that timestamp for the thing that came in, and look at the timestamp of your message that went out of

30:11

of the system. you've got like That's literally how long it took, warts and all, every network hop. Anyway, that's one of the many sources of clock domains. And we were talking about clock domains in the context of ordering.

30:22

So yeah, go ahead.

30:22

yeah Yeah, well, and that actually brings up another topic, which is that time stamping is an example of something else that is a really good practice, which is tracing. right as each As the message flows through your system and as it's being processed at each stage, it is quite often useful to be able to embed in the message or maybe as a wrapper around the message, depending on how you do it,

30:32

Yes.

30:46

information about the tracing. And that can be useful for performance. It can be useful for like um you know error ah debugging, yeah like you know like just general observability, figuring out, like hey, this message failed to process...

30:50

Yep. Well, debugging. Yeah.

31:01

Why? like Where did it stop? What problems did it run into? Or it was really slow to process. Why? What was the bottleneck? right What was the slow part? um And, you know, sometimes you'll do things like creating some sort of identifier at the point of ingestion or message creation.

31:18

And then you can have like an external system that refers to the message as it flows through using that identifier. Or sometimes you're literally just adding information into the message object as it's flowing through um to.

31:27

Right.

31:30

That, incidentally, is what we used ah the nanosecond timestamp for, because obviously the the the hardware on the outside would put this nanosecond timestamp on every packet. We're like, well, that's a unique identifier, except when it isn't.

31:40

Yeah, yeah, yeah, except it's not.

31:41

um But most of the time it is. And and then it would gives you this sort of unique ID, this sort of like trace ID, which is carries information in its own right, because it's the time that it arrived as well.

31:53

Yeah, not always, ah unfortunately, not always unique. um No, that's, ah we I've variously seen this as, you know, "provenance" or "tracing" or or "causality", or, you know, there's, the and I'm sure like that I know that the OpenTelemetry projects, I keep being pointed out, and I'm going to start looking at that soon.

32:11

I keep meaning to, um they seem to have a whole bunch of stuff around the telemetry of more just generally of systems, but I wonder if they have something that also talks of or or can be used to correlate.

32:22

That's another one, "correlation IDs" and things like that. One event and like the the causality as it traces through your system and you see all the different events. I mean, even on just like a website, just seeing that someone clicked a button and caused an error and you're like, well, that the backend error was caused by this click over here is useful.

32:40

Anyway, sorry, again, off, really off base here, but yeah.

32:43

No, I mean, I think these are all these are all dimensions of this problem that you need to be thinking about if you're going to build systems like this, right?

33:00

we've we've we've talked about um various dimensions so far of messages. We talked about like durability, we talked about sequencing, we've talked about ah now tracing, um which sort of had determinism ah what are the and and you know very we We opened with you know don't put giant areas of data giant blocks of data into your messages.

33:28

Yeah, yeah.

33:28

And we said, be very careful about which clocks you use. What other the considerations are there?

33:41

I mean, so how would your your monitoring system? does it what Let's just think a little bit about the monitoring system.

33:45

Yeah, yeah, yeah.

33:46

So that had a very, very high set of inputs. Like, essentially, it was ah it was a centralized monitoring system for the whole company's services. though All the services could send all the stats they wanted to it.

33:56

yeah

33:59

And you had to deal with it.

34:01

Yeah, I'll tell you one thing one mistake that we made, and this is you know ah good judgment comes from experience and experience comes from bad judgment.

34:02

[laughs]

34:09

And so listeners, I hope that you get to benefit from all of the bad judgment of the of the people on this podcast and the hard-won experience.

34:17

And so when I say like you need to be careful about clock domains and you need to think about like where your source of time is, one of the great mistakes that we made very early on in that project, and it's something that just haunted us forever,

34:30

is we allowed people who were sending messages to the system. So the idea behind the system is that you'd have you know external clients that could send you know telemetry data or, I mean, basically anything like prices, internal application metrics, whatever they wanted, they could send um you know data to the system.

34:49

It worked a little bit like StatsD, if you've ever used StatsD, but it had sort of more, yeah, yeah.

34:52

Yeah, sort of prometheus-y type things that but but it's ah a lot more it was designed for more real time stuff rather than like once a minute once a second kind of stuff it was it was very much like

34:59

Yes, yes, yes. The idea behind the system was like, you know, it's cool and that Grafana has a chart that updates once a minute, but we need something that can update many times per second because it's monitoring trading systems.

35:10

And if something happens, we need to know about it right now. So like human time. But one of the great mistakes that we made with the system was allowing people to put their own timestamps on those messages.

35:21

That was a terrible idea. An absolutely terrible idea.

35:24

It's so easy to do. I can see why you'd want to be able to do this.

35:28

Yes.

35:28

You know, like I find this quite often with things like um the, ah like our Prometheus setup, because, you know, like, Hey, I've got a build.

35:34

Mm hmm.

35:37

I want to like measure my build time and I want to post it. And then sometimes I want to go, actually, I want to go back in time and like run the last hundred builds one day apart from each other. And I want to populate some data in the database so that I i don't just have "now data". I have historic data once I've thought I want it.

35:52

Yeah, yeah, yeah.

35:53

Right. And so how bad would it be to let me post stuff that's in the past to you so that I can write my data?

35:58

Right.

36:00

Like, you know, it's a reasonable thing to want to do. So what was the drawback? What was the, what, what made you rue that decision?

36:06

Well, because inevitably people want to be able to say like, Oh, and also give me the list of all the messages that were delivered on this day. And now that's just wrong because your timestamp and my timestamp don't line up for whatever reason, right?

36:16

Right.

36:19

It could be that you post or pre or post dated your thing, but you did the calculation wrong.

36:25

It could be that like what you actually want when you say the delivery day that was delivered on that day was the delivery that data was that was delivered on that day and not like whatever timestamp it had, because that came out of your log file or whatever.

36:34

Well, this is, this comes back to almost like the bitemporality thing.

36:39

Bitemporality, yes.

36:40

It's like, you know, there's the time that I got it. And that's the kind of knowledge time. When did I know that you said that you wanted this thing?

36:46

That's one timestamp. And then the other timestamp is what time did you say that you wanted this thing to be known as of or related to, sorry.

36:46

yeah Yes.

36:51

Yes.

36:53

ah And you in almost all situations, those two times are coincident or so close that nobody cares, but not always.

37:02

Mm-hmm, mm-hmm.

37:03

right And I think that's one of the harder things. I don't know if we've weve ever talked about bitemporality. Maybe we have. I don't know.

37:08

I don't know.

37:09

We we must have done in passing. du but That's a whole interesting world as well.

37:10

Yeah.

37:12

you know like it's it's ah yeah You want to say, on this day, what messages did you send me?

37:19

Mm hmm.

37:21

And then you want to say, on this day, what samples fall in this window? Which is different from when did you tell me about those samples?

37:26

Right.

37:29

right That's a very, i mean again, they're mostly the same.

37:29

Right, right, right.

37:31

But yeah, that's OK.

37:31

Yeah, yeah, if I had it to do over again, what I would have said is no, you cannot specify the timestamp, but you can, and this was true already, you can put whatever data you want in your message and you can query based on any of that data.

37:35

Mm hmm.

37:46

So if you want to have your own log timestamp or ingestion timestamp or whatever, you can add that as a field to your message.

37:51

Yeah.

37:55

My system will be blissfully ignorant of it other than it's another field that you can do stuff with and you can do whatever you want with that timestamp.

38:04

Yeah, that is your, that is your piece of data to do with you wish, but we know when it arrived with us and that's all we're going to like keep as the sort of primary thing that we can.

38:12

Yeah.

38:14

Yeah. yeah

38:15

Yes. Also, speaking of timestamps, Please, please, please do not put localized timestamps in your messages.

38:22

Oh.

38:23

It's so it's a long, it's a yeah it's it's it can be nano precision, it can be millisecond precision, it can be second precision. I don't even care, but it's a number. Please just put a number in there. Don't put some parsed string with a time zone offset.

38:34

Yeah.

38:36

and No.

38:37

No, and store it in UTC for this kind of thing or some well-defined never-changing thing.

38:41

Yes.

38:45

um I think, I don't know to what extent it's an open secret or not, but um a very large web search company ah to this day, to the best of my understanding, still logs everything in West Coast time, which means that it,

39:02

Its logs and the graphs that go with it have a twice a year, either a big gap or a weird back double backing on themselves type of thing.

39:09

Mm hmm.

39:10

um And it's just the cost of changing it is so high that it hasn't been done.

39:12

Mm hmm.

39:14

But yeah, you there are time, there's a time and a place for a localized time. And it is in application level. things, like if if you're if you're if you're saying um if you're trying to talk about what time did a trade happen on a particular exchange, it is useful to specify it in the local time of that exchange, say, because you know that our exchange opens at 8.30 local time on that day and closes at 3.30 local time on that day.

39:23

Yes.

39:43

But if you have to sit and try and work out or do anything other than compare with arithmetic operations straightforward arithmetic operations on a 64-bit number then you're doing something wrong.

39:55

If you have to kind of work out what day that was and then i was it daylight savings or not on that wait a second that was in europe wasn't it and they don't do daylight saving.

40:03

Absolutely, absolutely. like Like religion and politics, time localization should only be discussed in the home. Like you, the international standard is a 64-bit number.

40:13

And only when you're displaying it or like viewing it or or making a report, do you ever take that 64-bit number and turn it into some localized time that is localized for the person who is viewing it, right?

40:26

Yes, or the whatever it is.

40:26

Or the system perhaps that is viewing it. But yes, yes, right.

40:28

Yes, no, then that makes sense. Yeah, I think i that is that. And then, ah yeah, nanoseconds since 1970 is not a bad thing to fit into 64 bits.

40:39

That'll get you to, I can't remember when, but it was, you know, it's far enough in the future, that at least right now, I don't have to worry about it before I retire, although that is, you know, I'm an old man. So maybe ah maybe the younger folk will have to worry about it.

40:51

Mmhm

40:51

um ah But there are no any number of of ways of storing time better than that or you know yeah you can pick your own epoch right: you don't have to be 1970 is convenient if it is cuz then you could use.

40:59

Yeah.

41:04

Right, right.

41:04

ah the Unix date command to kind of move back and forth. In fact, one of the first things I do, ah yeah I've checked in all my dot files. Sorry, this is another sidetrack, but one of the like fish functions of the shell that I use is to convert numbers from an epoch time to like a displayable time and backwards, right? So I could do epoch and then just type a number in and then it, based on however many the digits it's got, it guesses whether it's millis, micros or nanos, and then it prints it out in my current time zone. And it is the single most useful thing. I know people go to epoch-converter.com, which drives me bonkers to see, you know, why would you go to our website with all these flashing ads and things on it, just to convert some numbers when it's like something that command line can do, but on the other hand, it's a pain to do.

41:44

Yeah. Or you can just open up a JavaScript console in your favorite browser and paste the timestamp into `new Date()`.

41:51

Yeah, that's true.

41:51

And that'll, that'll also give it to you. um

41:54

That's a great one.

41:55

yeah

41:55

I'm remembering that one.

41:55

Yeah, it's super, super convenient most of the time.

41:56

That one's even more portable than mine, yeah.

42:00

um Another thing to think about here, and this is kind of getting back to, you know, I was saying like, don't put a database in the middle of your messaging system, right? ah generally Generally, sometimes it's it's fine.

42:07

Right.

42:09

And, you know, as you said before, sometimes it's just a file. But like, okay, if I can't do that, then how am I supposed to bridge the gap? Because there will almost certainly be a gap. between the world of like stream processing systems and batch processing systems.

42:24

right like At some point, someone's going to run want to run a database query or something on your data.

42:30

Right.

42:31

right And how do you handle that? right And also, this kind of ties into a durability thing, where it's like if you don't have a system like Kafka or some other sort of durable queue, in the middle of your system to kind of keep track of the history. You know you just have you know UDP packets or you have something else. like What should be responsible for sort of keeping the historical record of everything that has ever happened, right?

42:55

Right. Right.

42:57

So I...

42:57

Which obviously some people don't need and that's fine.

42:59

if you're if you're If you're a video game server and you've got the player positions that are being updated, then maybe you don't need a log for all time.

42:59

Right.

43:02

Yes.

43:06

But you know, if you're working in finance, it's generally a good idea to keep everything forever for all time in case somebody comes and asks you a very awkward question about what happened.

43:06

Yes, yes.

43:15

Yeah, yeah. And this ties in also to another thing that we were talking about, about ah reproducing state for state machines. So it's like it's you know the cool idea is like, all right, I'm going to take my messages. I'm going to pass them into some system that processes them. There will be no other information that goes into the state machine other than the messages itself. And therefore, I can completely reproduce the state from the sequence of messages. It's like, yes, that's cool. But what happens when you have seven years worth of messages? and you have to start at the beginning.

43:43

right right right

43:45

That seems bad. So one of the things that you typically do is you have something that is consuming the stream of messages whose purpose is to store them and also potentially snapshot them.

43:58

right So you you have something that is consuming the messages, it's writing them into some persistent store, maybe it's even like transforming them into like something that can fit into it like a database table or some other format that is nice for bulk processing. And another thing that it might be doing is running this sort of state machine and taking a snapshot at some regular interval and then putting that into the storage as well. So that when you need to reproduce the state for some particular point in time,

44:28

Rather than having to play all seven years worth of messages through your system, you can jump to you know a prior but recent snapshot and then load that state into your system and then only replay the messages forward from there.

44:41

And that will be much faster and much more efficient.

44:42

Right.

44:45

Right, right, right. Provided there is there exists a sensible snapshot format, which is an interesting.

44:51

Right. Right.

44:52

So I think what you're this this has now sort of moved into what what I think of as like a log structured journal of light you know like, you have some database yeah or in-memory representation of the world that you update through seeing these events.

45:05

um For some things, um so for example, to build the set of live orders on an exchange, that is the prices of like Google and all the people that are trying to buy and sell Google, um you can unambiguously snapshot that state and go, okay, this is what um this is ah at this point in time, at nine in the morning, these are the, everyone's orders. And now if you just load up this nine AM, you can carry on. You don't have to have loaded up, you know, the seven AM m ones and, or the whole, from the whole day. That's fine, right? But as soon as you start getting to things that have state that is like non-trivial,

45:40

now it becomes a function of the processor of that state. So let me give you an example. What if you were keeping some kind of exponential moving average of some of the factors of that?

45:52

That depends on how long the window f your exponential...

45:55

Uh-huh.

45:56

is, and some other properties of that. What do you count? Which kinds of information go into that or don't? And now you've got a complicated piece of state that is arguably different for every client. you know Maybe some people care about a 10-day look back, and then other people want a you know a five-minute look back. And so that gets kind of tricky. I don't know where I'm going with this now, but like if it just it's it's not as straightforward for um application domains if they have any kind of state that is that requires some history in order to get to the point other than the like the pure individual like add/remove of, say, a book, unambiguous stuff, yeah.

46:37

Yeah, and that state can get quite large because of these constraints. And I think this is something that is really important to think about because this kind of snapshotting is becomes very important when you think about error recovery, right?

46:54

And there's two dimensions of error recovery that I think we we can talk about here. One is you've got some consumer of the stream and it's crashed. And now you want to restart it, right?

47:02

Right? Yep.

47:05

what state do you need to to let it sort of rejoin the stream, right? Again, do you have to go back to the beginning of time and process seven years with the messages for your system to restart? That's gonna be bad, right?

47:15

Oh, we'll fix it next year when, yeah, it only gets worse, yeah.

47:17

So if you, yeah, right. Yes, we've we've rebooted it and the website will be back online in 2038. um so ah So you have to think about the state if you want to be able to recover, and you need to think about how you can reasonably snapshot that state if you want to be able to spin something back up and have it sort of rejoin this stream, right?

47:42

um And so you have to I think you have to consider that from the very beginning. like how How big is the state? How often can we snapshot it? What is our sort of acceptable amount of downtime here for these various things?

47:54

you know Is it like an hour? Is it a minute? Is it you know a month? um And how are we going to be able to to rejoin this processing? Otherwise, we can never turn this software off, right?

48:06

Right, which is an option. um Just don't write any bugs.

48:12

Yeah, right.

48:12

I don't have any hardware faults...we'll be golden!

48:13

ahha yes yeah yeah yeah yeah um Another dimension to think about with with fault timelines with these systems are poisoned messages.

48:23

Yeah.

48:23

right so That is a very common situation where there's a bug in your system or a bug in a producer system, perhaps, and you receive a message

48:34

that you can't process. right And redundancy here will not save you. right You can have 10 redundant systems that are all consuming the stream and processing the messages so that if one like runs out of memory or whatever, you know the other nine are there.

48:48

But if they all have the same bug and they all get the same message, the whole point of the distributed state machine is that they are all going to do the same thing, which is not process your message.

48:54

er all crash

48:56

That means they might crash. you know All kinds of manner of problems can happen here. So one common approach to dealing with these things is creating what's called a dead letter queue.

49:09

So you you have a message that comes in and your system cannot process it, but it's able to detect that it can't process it. Maybe it raises an error, maybe there's some validation step, whatever it is, and it's like, I can't process this message.

49:20

They're all crash.

49:22

So what I'm going to do is I'm going to take it i'm going to put it into another queue, another stream of messages called the dead letter queue.

49:28

Streamer messages, yeah, yeah.

49:31

And it's going to sit there until somebody does something with it. Now, the first thing that you want to do with it is send some kind of notification or alert or something to tell everybody, yes, like you know somebody's getting paged.

49:39

Someone's phone should go off.

49:43

It's like, ah, we just got a message we don't know how to process, right? um But if you if you do that, then depending on the state machine that you're trying to reproduce, if you have one, or just the message processing that you're doing, it can sometimes be OK to say, OK, I'm going to take this message. I'm going to put it in the dead letter queue. And then I'm just going to keep going. right I'm going to pretend like I never even got this message because it's malformed or it's it's there's some other problematic thing with it. and I'm just going to keep going.

50:11

You can obviously run into situations where there's just a bug in your code and this is a message that you need to process and you didn't process it correctly and now your state is wrong.

50:18

And now you're doomed. Yeah, yeah.

50:20

But there are also situations in which you have one of these messages and it is truly something that is malformed and can be ignored, was never supposed to be created in the first place, and now you can just continue on having this in the dead letter queue.

50:36

A common pattern that I ah have used with great effect is being able to basically re-drive those dead letter queue messages back into the main queue if sequencing doesn't matter.

50:46

Oh, interesting.

50:48

but If sequencing matters, then you can't do this.

50:51

right But if you have a system where there's no sequencer ah or there is a sequencer where it doesn't really matter all that much, then you can take these messages and be like, all right, we got this message, we don't know how to handle.

51:03

um It went into the dead letter queue. We're now going to change the code so that it can handle this message in some way, redeploy that, and then re-drive the message back into the queue so that it can be correctly processed and flow all the way through.

51:17

Right.

51:17

right and That is a really nice way to handle it if you can.

51:21

If you're able to do that, then yeah, that's a really, and that's, that's so particularly, for example, if this was some, um you know, holiday booking stream of information, you're like a centralized holiday booking thing, and then if someone comes in and they've just booked some suite and some

51:23

Yes.

51:36

the price is higher than you've ever hit before and some internal issue happens and you're like, oh, damn, you know, we can't book this for them because it's it's a it's legitimately $100,000 a night a thing.

51:48

And that just overflows something we're done.

51:49

hu hu Yeah,

51:50

But you're like, this is really valuable business. Ben, could you hotfix that very, very quickly? Write a test, fix the test, deploy the thing, and then we're gonna put it back in again, and then the booking goes through, albeit 30 minutes, an hour late.

52:04

At least it gets done, and you caption the revenue, and everyone's happy, and, you know, it ever it's... ah ah Yeah, that that seems like a really nice way to heal the system in that instance.

52:04

Yeah, yeah.

52:11

Yeah.

52:14

But obviously, sometimes it can be a legitimate bug or a malformed message or something something like that.

52:15

Yeah.

52:20

Yeah, and you have to be able to deal with it. Yeah, because as you say, fault tolerance was was a dimension that you talked about.

52:24

Right.

52:25

So ah Another dimension for message processing systems is that, like, things go wrong, computers go wrong, and it's entirely reasonable to have more than one person, more than one person, more than one system listening to this stream of messages and independently processing them and updating them. And then if the machine breaks,

52:44

Well, you've got two more of them, and that's OK. And then you have to have a system behind that system that determines what the actual outcome of any particular update was. But you've got fault tolerance by scaling through a messaging system. And that's that that's a really interesting solution. And part of the solution that we put together at the aforementioned cryptocurrency trading place, which was a really interesting solution for a number of of of of things that we were doing, wasn't it? It allowed us to do rolling updates of the code because we could have a quorum of five machines doing the same processing and then take two of them out of the system, upgrade them and then put them back in again and then run them in silent mode and check that everyone still agreed on everything that was happening

53:29

And then only when we were confident that we hadn't introduced a new bug, we could add them back into the pool and then start rolling over the other three. And there you go. Now you can do rolling upgrades and you're never down. Hooray.

53:40

Mmhmm, mmhmm, mmhmm.

53:41

Um, it let us do things like have different configurations of those computers, be it through the different JVM settings or different hardware or whatever, such that if one of them processed the message faster than the other or one of them had to GC say, or one of them was doing some JIT work or whatever. Um, we could make sure that as long as two or three came up with a a good answer that we were happy with, the other two could be slower and that's fine. And that meant that we could hide some of our tail latencies.

54:11

Yeah, yes.

54:11

in, in the quorum, which was, you know, so we got all these ah wonderful and obviously, yeah, if we had an equipment failure, then, you know, two or three of those machines could die and the machine and the site would stay up and we'd be able to process transactions and everything.

54:23

And that was super cool. And was, was definitely eye opening to me working there in terms of like, Hey, you get a lot of benefits from doing it this way. That's great.

54:33

Yeah, I had that that same experience. and And we had done some things like that at ah my my previous company when we were basically intentionally creating races between systems because we were trying to get them to run as fast as possible. And it created a an opportunity to to make the system more fault tolerant, where you'd have you know multiple parallel things that are all processing the same stuff. And the first one to finish wins.

54:59

And so like if there's some variation in the latency because of some you know operating system level thing, or a garbage collection because some of this was Java, or so something else had happened, right or one of them was just offline and was losing every race because it just wasn't processing anything, it was all fine.

55:11

Yeah.

55:17

right um I think one of the more interesting ah things from that is if you want to be tolerant of certain types of failures, you know like gamma ray burst type stuff where bits just get flipped, then the number of systems that you need to do this is three.

55:34

The number of counting is three ah because you need to have two of them, not one, not two.

55:38

Not one. Not two. Five is right out.

55:42

And five five is actually kind of fine in this case, but you need at least three, right?

55:45

[laughs]

55:48

um Because if you have two and one says the answer is A and the other says the answer is B, You don't know which is right. You need you need three so that you can compare, okay, two of them say it's A and one of them says it's B, so B is suspect.

56:02

And if you have five and four of them say it's A and one of them says it's B, then that's even better, right? But you need at least three.

56:06

Time to, yeah.

56:08

Yeah.

56:08

so're all running the same version of the code it's time to yes start looking through your radiation hardening protocol for what on earth happened or check the `dmesg` for any kind of uncorrectable error memory errors and things of that nature but but yeah that's

56:11

Yes. Right.

56:25

Yeah, I think I've just looked at the time and we've been, gabbling you know, given that we hadn't really got a plan, which is, yeah you know, and as regular, our regular listener will know, is how we do this.

56:36

We have, we've covered quite a lot of ground, although I don't know that we covered our intended topic. exactly as I would have done if we'd have written out something before because we went on so many tangents, but in a good way.

56:47

Like we talked about time, we talked about durability, we talked about scalability, um and all these things come out of a ah message-based system or can come out of a message-based system.

56:56

Mm hmm.

56:56

and Especially if you have this sort of like journal-based thing where you say the sequence of messages is the only input into my state machine and I can trivially start from the beginning of time and get to exactly the same state.

57:09

Or we can snapshot if we know what the internal state's important at different points along that time and have the best of all worlds, which is which is super cool.

57:15

Yeah.

57:16

Yeah. um So I think by way of saying, maybe we should stop here is what i why I bring all that up.

57:18

yeah

57:23

So um this has been super cool. And definitely some deep memories there from from previous companies coming up there.

57:30

Oh, yeah. Bring in bringing bringing back the hard lessons of systems past.

57:35

"We make mistakes so you don't have to."

57:39

A new tagline on this podcast.

57:40

That is our new tagline, okay. We've certainly made, I've definitely made plenty of mistakes as as well you know, um as I shared on social media a picture of me driving a car through a place where cars shouldn't go and was too, I got the car wedged. [ https://bsky.app/profile/matt.godbolt.org/post/3ldh76pqffc2z ]

57:53

Uh huh.

57:56

um Yeah, you have to look at me on

57:58

three gigabytes versus worth of car in your in your messaging system and it didn't work.

58:01

In my 2.9 gigabyte ah hard drive, yeah, it didn't work very well anyway.

58:06

Yeah, yeah, yeah, yeah.

58:08

Well, I think we should leave it there, my friend. Thank you as ever for for joining me in this endeavor of trying to, I don't know what we're doing, trying to what?

58:15

huh Yeah, yeah, this was a good one.

58:16

Be entertaining and enjoy ourselves and hopefully be interesting and useful to other people too.

58:23

All right, friend, until next time.

Rate

Join Podchaser to...

  • Rate podcasts and episodes
  • Follow podcasts and creators
  • Create podcast and episode lists
  • & much more

Episode Tags

Do you host or manage this podcast?
Claim and edit this page to your liking.
,

Unlock more with Podchaser Pro

  • Audience Insights
  • Contact Information
  • Demographics
  • Charts
  • Sponsor History
  • and More!
Pro Features