Why doesn't Ardour offer "plugin crash protection" ?

It is a commonly asked question: why doesn't Ardour offer plugin crash protection in the way that (e.g.) Bitwig and Reaper do?

The answer can be given in two ways, one simple and one very technical.

The Simple Answer

It doesn't scale up to work with large sessions with low latency settings.

What's large? "Large" for us starts at about 100+ tracks with an EQ, compressor and one other plugin on every track. What's low latency? On the order of 5ms between pressing a key, clicking the mouse or turning a knob and hearing the result of that action.

The Complex, Technical Answer

What is sandboxing/crash protection?

In the original conception of plugins, they were intended to be "blobs" of code loaded into the DAW itself and run "inside" it. This is very efficient, but also more dangerous since the plugin theoretically could intentionally or accidentally damage or interfere with any part of the DAW's functionality. Some DAWs decided to try to mitigate this risk by "sandboxing" them: running the plugins in an entirely separate program. The plugin isn't really a part of the DAW at all, but communicates back and forth with it using some sort of relatively efficient mechanism. If the plugin misbehaves, only the separate program crashes, not the DAW itself.

Context Switching

When a computer runs more than one program (which these days they do all the time), the process of changing from one program to another is called a "context switch".

When a DAW runs a plugin "inside itself", the computer has no idea that this is happening - it is entirely a job for the DAW to manage. Consequently in the "traditional" plugin model, there's just a single program (the DAW), and no context switches.

By contrast, when plugins are sandboxed in a separate program, the computer itself has to be involved in the decision to run a plugin. Every time the DAW needs a plugin(s) to run, there must be at least two context switches: one between the DAW and the program that runs the plugin(s), and another back to the DAW. Despite the steady (though slowing) increase in the speed of modern CPUs, the time it takes to carry out a context switch has not improved much.

Here's a link to a site that provides some (slightly outdated) information on how long a context switch can take. There are two parts to a context switch: there is a fixed cost involving saving/restoring all the registers of the CPU, and then there is a variable cost that depends on the memory usage patterns of the programs running before and after the context switch. The register save/restore cost has actually increased (a little) over time, because CPUs have more registers; the variable cost is still rooted in computer designs (L1 cache and the translation lookaside buffer (TLB)) that have not changed much in many years.

If you're interested in measuring how long a context switch takes on your system, there are tools available.

Context Switching in audio software

Now lets look at what this means in an audio context. We start by noting that every time your audio interface has or requires data, the operating system must "wake up" the DAW so that it can start on the low level work of actually processing audio. If you run your audio interface hardware at a 48kHz sample rate and a buffer size of 64 samples, the DAW has 1.3 milliseconds to process 64 samples of incoming and outgoing data. If it doesn't get the work done in time, you will hear a click or "dropout" or as we tend to call them in Ardour, "xruns" (short for "overrun or underrun").

Now lets imagine a session with 128 tracks, each of which has an EQ, compressor and some other plugin present. We have a total of 384 plugins for the session. There are several scenarios for how these plugins might be run in a separate process, depending on the technological sophistication of the developers and the signal flow within the session itself.

Least optimized/most likely: Each plugin is told to run explicitly by the DAW
Somewhat optimized/somewhat likely: The plugins for each track run as a unit, but each track is handle independently.
Highly optimized/not likely: All plugins are run by a single command from the DAW

In the first case, there are 384 context switches from the DAW to the plugin(s), and another 384 context switches back to the DAW from the plugin(s).

In the second case, which is possible only if there is no reason for the DAW to require the results from each plugin, there is one context switch per track from the DAW to the plugin(s), and another per track to return to the DAW.

In the third case, which is fairly unlikely but not impossible to engineer, there is a single context switch from the DAW to the plugin(s) process, and another back to it.

So, we have anywhere from 768 to 2 context switches for each block of audio processing. The third scenario that has only 2 is quite difficult to implement and like the second requires that the DAW has no reason to access the data from each plugin before running the next plugin in the signal processing chain. So we're going to ignore that one, since it's not particularly realistic (consider a session with a single plugin before ("pre") and after ("post") the fader as an example of a simple scenario that breaks this assumption).

Real world scenarios thus call for 256 to 768 context switches per block of audio processing. On a typical modern CPU/chipset, the fixed cost of a context switch is on the order of 3usec. The variable cost part is hard to predict (because it depends on the amount of memory a program accesses and the arrangement of that memory). Using the measurement tool linked above, we can estimate that real world costs per context switch for audio processing code are between 10usec and 300usec (0.3msec). Let's assume that our real world average is 30 usec. That means anywhere from 7.7msec to 23msec spent doing nothing but context switches!

The Bottom Line

Our scenario (48kHz sample rate, 64 samples per buffer) has 1.3msec to process all audio. Clearly, we cannot spend 7-20msec on context switches. This isn't going to work.

Things don't start to work until we get up to buffer sizes of at least 336 and 1104 samples. Even with these settings, there would be zero time do any actual audio processing, so let's assume that our DAW was going to run with about a 50% DSP load. We would need buffer sizes of about 700 - 2000 samples, or roughly 14-40 msec. Even though this would work, we'd be wasting roughly half the available processing time on context switches, our computer would be straining to keep up, all to enable a specific engineering solution necessitated only by badly behaved plugins.

Additionally many people find latency at this level to be quite noticeable and somewhat annoying. So clearly there's a range in which out of process/sandboxed plugins will work OK if the user is willing to deal with larger latencies. If the user expects/requires the kind of latency you would see in "hardware solutions", then this design cannot work.

Summary

It may work for 4 track Bitwig session with 12 plugins or thereabouts but it's not suitable for any large scale work, and certainly not at the very low latency settings that Ardour can be run with.

In short, plugin sandboxing is a way to use plugins that crash their hosts so that when/if they crash, they no longer crash the host. We think that it is a better idea to simply not use these plugins (or, if you prefer, use a different host).