Sunday, March 26, 2017

Update Properties

It is time, once again, to update the properties page.  Since I haven't updated in a while, there is lots to do.

Saturday, March 25, 2017

Introducing Miranda: Motivation

In preparation for a talk that I'll give in June at DOSUG I'm going to post my thinking as it develops. The first section has to do with the motivation for Miranda.  The next section deals with how Miranda works and the last sections sums up.

So without further ado here's what I came up with:

The first section deals with the motivation for and the origins of the predecessor to Miranda: Prospero. It talks about how our boss wanted "9 9s of reliability" and how this works out to 30 milliseconds of downtime per year!  For the curious, I did some tests...

A ping of yielded an average of time of 100ms. and timed out.

A system with "9 9s of reliability" is unattainable for the average company because it would cost too much.  You would need highly available hardware, several sites, a distributed, formally verified system, and hundreds if not thousands of developers.

Consider that the system would need to run on special hardware that has at least two "nodes" each of which has its own CPU, memory, disk and network connection.  Each node sends a "heartbeat" to the other nodes to ensure they are up. Remember that a failing node needs to switch over in less than 30 milliseconds when a fault occurs or the show's over!

Just the system that monitors the actual system would be challenging to design!

Such a system could, however, be built.

It would just probably cost billions of dollars, take years to develop, and require hundreds if not thousands of developers.

Friday, March 24, 2017

Why not Prosepo? Why not Modify Prospero?

Up until now, I have been talking about the limitations of the Prospero system.  But wouldn't it be easier to just modify Prospero instead of starting from scratch?  That's what this post is about.

I had to create a new system because:

  • A total rewrite was required to get away from Erlang
  • Prospero is tightly tied to Mnesia
  • Prospero is not well documented

A Total Rewrite was Required to get away from Erlang

A limitation of Prospero was that finding people with Erlang experience was difficult or impossible. Using a language like Java means a total rewrite anyways.

Prospero is tightly tied to Mnesia

The database that Prospero uses, Mnesia, is very tightly coupled to Prospero.  One of Mnesia's limitations is that it doesn't cross availability zones well.  Getting Prospero to work with multiple availability zones would require replacing Mnesia - a major rewrite.  Given that situation, a total rewrite would not require too much additional effort.

Prospero is not well Documented

While Prospero (Subpub in its latest incarnation) is widely used it is not well documented in terms of comments and the like.  The amount of effort required to document Prospero would approach the effort required for a rewrite.

Thursday, March 23, 2017

Why not Prospero? Miscellaneous

This is part of a multi-part discussion of why I started the Miranda project.  This post is a grab-bag of issues that I had with the original system.

Prospero Only Records HTTP POSTs

The original system only captures HTTP POST events. It didn't forward PUT and DELETE.

Prospero Only Uses HTTP for New Events

You could not send new events via HTTPS.  Furthermore, all communications between nodes was unencrypted, making it unsuitable for a non-secure environment.

Prospero Uses Symmetric Key Encryption

This meant that administrators knew all the keys; and if an attacker gains access to one table, they get all the client messages.

Miranda Addresses All these Issues

Miranda forwards HTTP PUT and DELETE, as well as POST.

Miranda does everything using SSL/TLS dealing with the 2nd issue.

Miranda uses public key encryption instead of secret key encryption. Thus administrators don't actually have client keys and an attacker gains no advantage by getting access to the keys that clients use.

Wednesday, March 22, 2017

Why not Prspero? Availability Zones

Prospero cannot work across availability zones: so you cannot have a West coast data center and an East coast data center work together.  This is, admittedly a rare problem: only a few companies even have a data center these days, but with the advent of services like AWS people want this capability for their systems.

Prospero relies on a distributed database called Mnesia, and Mnesia does not deal well with long delays, such as those you might run across in coast-to-coast operations.  In addition Prospero uses RabbitMQ, which also wants all its nodes to be in the same data center.  For these reasons, Propero is limited to one data center.

This was a problem.  For a mission critical application to go down if you lose one data center is bad. If a hurricane takes down a data center for a week this could be a very bad thing indeed.

We never did come up with a solution, but Miranda is designed for distributed use.

Tuesday, March 21, 2017

Why not Prospero? Erlang

This is the first part of a multi-part discussion of why I started the Miranda project.  This post goes into the limitations of the language that Prospero was written in, Erlang.

Erlang was created at Ericsson (a telecom company based in Stockholm, Sweden) in the 80s and released to the world in the 90s. It is used with "soft real-time" (where you can occasionally miss a deadline).

Two major drawbacks to Erlang are that it is hard to find people with Erlang experience and it can take several months for someone used to a language like Java to become proficient in Erlang.

It is Hard to Find People with Erlang Experience

Unlike C++, Java or C#, it is much harder to find people who have experience with Erlang.  At Pearson in Denver, for example, we had to contract with a firm in Europe to get support for Ejabberd, a chat program written in Erlang.

When we tried to find new members for the team to support Prospero, we had to dispense with Erlang experience as a requirement because nobody had it.

Erlang is basically a niche language in this country, with few adherents.

It is Hard to Train People in Erlang

As a rule of thumb, it would take several months before a developer was "up to speed" with Erleng.

In learning Erlang, one had to learn a different style of development called Functional Programming. An important difference with Functional Programming is that there are no variables - so a statement like "i++" should not be supported in a functional language. This is very different from traditional (imperative) languages, and takes awhile to get used to. 

Erlang syntax is also very different from the various "C-like" languages.  For example, if expressions in Erlang cannot have function calls and are seldom used.

The difference in programming styles and syntax combine to make Erlang a difficult language to pick up.

Erlang also has Good Points

Erlang also has its good points like light weight processes.  I once created a program that used several million threads (called processes in Erlang), but I couldn't do the same thing in Java.  At several thousand threads the VM wanted more memory than the system had.

Monday, March 20, 2017

Why not Prospero?

Prospero, the predecessor to Miranda, was a system written by Chris Chew (with a little help from me) in Erlang to capture HTTP POSTs and play them back to interested parties.

The question is this: why not just use Prospero?  Why go to the trouble of writing a new system?  I will answer this question in a series of posts.  This is partially because the answer is that complex, but also so that I have something to talk about for the next few days.

In broad strokes, here is the answer:

  • Prospero is written in Erlang
    • It is hard to find people who have experience with Erlang
    • It is hard to train people to use Erlang
  • Prospero cannot cross availability zones
    • Mnesia limitation
  • Prospero depends on many other systems
    • Erlang
    • RabbitMQ
    • Mnesia
  • Why not modify Prospero?
  • Misc reasons
    • Prospero only deals in POSTs
    • Prospero only runs over HTTP, not HTTPS