Saturday, March 25, 2017

Introducing Miranda: Motivation

In preparation for a talk that I'll give in June at DOSUG I'm going to post my thinking as it develops. The first section has to do with the motivation for Miranda.  The next section deals with how Miranda works and the last sections sums up.

So without further ado here's what I came up with:

The first section deals with the motivation for and the origins of the predecessor to Miranda: Prospero. It talks about how our boss wanted "9 9s of reliability" and how this works out to 30 milliseconds of downtime per year!  For the curious, I did some tests...

A ping of google.com yielded an average of time of 100ms.
micosoft.com and ibm.com timed out.

A system with "9 9s of reliability" is unattainable for the average company because it would cost too much.  You would need highly available hardware, several sites, a distributed, formally verified system, and hundreds if not thousands of developers.

Consider that the system would need to run on special hardware that has at least two "nodes" each of which has its own CPU, memory, disk and network connection.  Each node sends a "heartbeat" to the other nodes to ensure they are up. Remember that a failing node needs to switch over in less than 30 milliseconds when a fault occurs or the show's over!

Just the system that monitors the actual system would be challenging to design!

Such a system could, however, be built.

It would just probably cost billions of dollars, take years to develop, and require hundreds if not thousands of developers.



No comments:

Post a Comment