| Ardour DAW

Representing Time in Ardour

Within Ardour, there are two fundamentally different ways of specifying positions along a timeline, as well as distances between two such positions.

Audio time: a linear, monotonic timebase. Units are called "superclocks", which provide a sample-rate independent means to specify time. There are 508032000 superclocks per second.
Beat time: a linear, monotonic timebase. Units are beats, which are arbitrarily defined as 1/4 notes.

The choice to use 1/4 notes is arbitrary in that any other possible choice for the note length of a beat is a simple integer multiple (or fraction) of a 1/4 note, and thus has no impact on the semantics of "a beat". We could use whole notes as "beats", which would simply introduce a factor of 4 into all arithmetic relating "beats" to quarter notes. We choose 1/4 notes because this is conventional in many other DAWs, the SMF specification (to a an extent) and several other related areas.

Beat time and audio time are related to each other via a tempo map. In its simplest form, a tempo map is a data structure that defines a single tempo (number of beats per minute) and a single meter (number of beats per musical measure, or "bar"). Converting between beat time and audio time requires only the tempo.

However, in more complex tempo maps, there may be arbitrary numbers of different tempos placed at (somewhat) arbitrary positions and any or all of them may be ramped (they are not constant). This means that in general, the relationship between audio time and beat time is complex, and cannot be inferred from trivial arithmetical formulas. In particular, it means that the number of beats represented by a certain number of superclocks (and vice versa) may change along the timeline (because of elements in the tempo map changing the relationship).

Data Types

Fundamental Types

There are two fundamental types used to represent domain-specific values:

superclock_t: a typedef for int64_t, represents an audio time position counted in superclocks (i.e. sample-rate independent). This data type almost never appears outside of libtemporal, other than in serialization/deserialization code since we store audio time positions using this type. In a running instance of the program, we know the sample rate, and thus the overwhelmingly common type for audio time values is samplepos_t.
samplepos_t: a typedef for int64_t, represents a audio time position counted in samples (i.e. sample-rate dependent, unlike superclocks)
samplecnt_t: a typedef for int64_t, represents a audio time distance counted in samples (i.e. sample-rate dependent, unlike superclocks).
Beats: a class that represents a music time position or distance counted in 1/4 notes from an implicit origin. It is a fixed point integer type, with 32 bits for the whole beat component, and 32 bits for the fractional beats, denominated by Beats::PPQN (aka Temporal::ticks_per_beat, set at 1920. All regular integer arithmetic operators are available, using both integers and Beats as arguments.

samplepos_t and samplecnt_t are essentially interchangeable because audio time is linear and monotonic. The existence of two different typedefs for the same underlying primitive type serves just a syntatic purpose - making it clear(er) when we are referring to a position or distance.

Distance versus Position

Because of the non-linear relationship between audio time and beat time, whenever we need to represent a distance, we also need to specify where the distance is located. It is not adequate to specify "4 beats" because the audio time duration corresponding to 4 beats may different at various points along the timeline. Similarly, we cannot know how many beats 20,000 samples represents unless we know where that sample duration is on the timeline. Consequently, all measures of distance must form a pair:

    (distance, at-position)

However, there is one special case which is widely used. That is the case where the distance is being measured from an implicit origin (the "zero" of the timeline, or graphically speaking, its left edge). When we talk about the position of things along the timeline (for example, regions), we are actually talking about the distance between their position and the origin. We therefore need only specify the first element of the pair above, because the "at-position" is implicit (zero).

Consequently, we have two basic time types present within the program: timepos_t and timecnt_t. The former is used to hold positions, that is, distances measured from the timeline origin. The latter is used to hold distances measured from arbitrary positions. In some contexts, it is more accurate to label distances as "durations". Either way, timecnt_t is a pair of values, whereas a timepos_t is a single value.

A different way to think about this question is in terms of the result of a subtraction operation. Suppose we wish to compute two values:

the distance between two positions
the position obtained when moving a certain distance earlier from another

In most languages (and certainly in C/C++) we would use subtraction to do this:

    distance = later_position - earlier_position;
    earlier = position - distance;

A little thought will make it clear that the data type returned in the first line must be different from the data type returned in the second line, if we need to be able to (potentially) change time domains. Suppose that the distance variable was a simple scalar value, say 2000 samples. How could we convert that to beats? We need to know where on the timeline it is located so that we could check the tempo map and do the conversion. The same is true if the scalar value was 19 beats, and we wanted to convert to samples. To be able to convert distances between domains, we need to know the position at which the distance occurs, and that means that in general we need a pair of values as indicated above.

int62_t

Both timepos_t and timecnt_t rely on a lower level data type called int62_t. This is a fundamental building block of Ardour's time representation, and is at its core, just an int64_t. Such a data type offers us 63 bits to represent a numeric value, plus a single sign bit. For int62_t, we "steal" the most significant bit position to act as flag for the time domain. If the flag is set (i.e. the most significant bit is 1), then the value stored in the remaining 62 bits is in the beat time domain. If the flag is not set (the most significant bit is 0), then the value stored in the remaining 62 bits is in the audio time domain). To avoid potential threading issues, the 64 bit value is always handled using atomic reads and writes, ensuring that the flag bit and the value can never be modified independently. There is never any doubt about the semantics of the value held - it is either an audio time value, or a beat time value. Note that the implementation of int62_t does not use terminology related to time at all: the flag bit is "just a flag bit", and doesn't have any inherent semantics. Those are left for users and derivatives of this class.

Within the code, int62_t is intended to function identically to an int64_t in every way. All integer operations are available and should work as expected. You may add, subtract, divide, take the modulo of etc. etc. an int62_t and the result should match your intuition. Any cases in which this does not work should be considered as implementation bugs.

One exception to this rule is that for obvious reasons, the range of an int62_t type is smaller than that of an int64_t. We revisit this issue below, because it intersects with the question of the range of each time domain.

`timepos_t`

timepos_t IS-A int62_t, and extends its parent class with a variety of methods related to time domains. A timepos_t can be constructed from a samplepos_t (or samplecnt_t) representing an audio time value counted in samples. Alternatively, it can be constructed using a Beats. In both cases, the value stored in the int62_t underlying type is derived from the constructor argument, and is not the literal argument. For samples, we convert to superclocks. For Beats, we convert to "ticks" (beats * PPQN + ticks).

A timepos_t can be used to represent distance also, as long as all users of the value agree on the implicit (zero) origin. A timepos_t representing a distance of 12 beats, for example, implicitly means "12 beats from the timeline origin". For example:

    timepos_t twelve_beats_from_zero (Beats (12));

    // compare with actual timecnt_t distance expressions

    timecnt_t also_twelve_beats_from_zero (Beats (12), timepos_t (BeatTime));
    timecnt_t another_twelve_beats_from_zero (Beats (12)); // audio time domain zero

The second argument argument in the second constructor denotes zero-in-beat-time, while the lack of a second argument in the third constructor denotes zero-in-audio-time. As stated above zero-in-beat-time and zero-in-audio-time should always be equivalent.

To construct a timepos_t representing zero, we can just call the constructor with no arguments. However, this has some danger because this will use audio time by default, which may not be what is intended. To counter this, the implementation of timepos_t attempts to treat zero in either time domain as identical. This expression must (and does) evaluate to true:

    // compare zero (audio time) with zero (beat time)
    timepos_t (AudioTime) == timepos_t (BeatTime)

There is still room for danger here, however. Adding beat time value to zero-in-audio-time will generate an audio time value. This may not be what was intended.

Returning to the observations above about the difference between computing a distance between two positions, or the result of shifting a position earlier in time, we must note one very important restriction on timepos_t. With a normal arithmetic type, we would compute either of these using subtraction. But because (a) we need to return different types for each operation and (b) we wish to be clear in the code about what we are actually doing, timepos_t has no accessible operator-(). You must use the distance (timepos_t const &) or earlier (timecnt_t const &) methods to compute either of these, as in:


    timecnt_t four_beats (Beats (4)); // a distance of 4 beats, as measured at zero
    timepos_t pos (Beats (18)); // a position of 18 beats from the origin

    timepos_t earlier_pos = pos.earlier (four_beats); // 14 beats from origin
    timecnt_t distance = pos.distance (four_beats); // -14 beats at 18 beats from origin

`timecnt_t`

timecnt_t HAS-A int62_t member (distance), along with a timepos_t member (position). A timecnt_t object can be constructed in the following ways:

    timecnt_t zero_audio; // zero distance, in audio time
    timecnt_t zero_beats (BeatTime); // argument specifies time domain
    timecnt_t about_a_second (48000); // 48k samples, from zero origin
    timecnt_t about_a_second_at_10_beats (48000, Beats(10)); // 48k samples at 10th beat
    timecnt_t four_beats_at_about_two_seconds (Beats(4), 88100);

Coding Guidelines

What Type to Use?

for a variable/member that will ONLY contain audio time, and is never serialized to disk, use samplepos_t or samplecnt_t for position or distance/duration, respectively.
For a variable/member that will ONLY contain beat time, use Beats.
For a variable/member that represents a position on the timeline, use timepos_t.
For a variable/member that represents a distance/duration along the timeline, use timecnt_t.
If considering using superclock_t, think long and hard. It's probably not what you want.
If considering using int62_t, you've made a mistake.

Based on the rules above, we can note that for instance, the code dealing with the fade in/out on an audio region uses samplepos_t/samplecnt_t. These fades have a duration that only makes sense to define in terms of audio time.

Using timeline types and time domains

Arithmetic operations do not change the time domain of the result. Adding a value in any time domain to an audio time value will always result in another audio time value. Subtracting a value in any time domain from a beat time value will always result in another beat time value. This applies to all arithmetic operators. The consequence of this is that if a timepos_t is created or assigned a value in the "wrong" time domain, no arithmetic operations will alter that time domain.

How To ... ?

Convert a timeline position from one time domain to another

Use methods of timepos_t (e.g. samples(), beats()) rather than asking the tempo map directly

Convert a timeline distance/duration from one time domain to another

Use methods of timecnt_t (e.g. samples(), beats()) rather than asking the tempo map directly

Access the tempo map

The tempo map is a "thread local" value - it may have different values in different threads. Any thread may obtain its current tempo map by calling TempoMap::use() (this returns a shared pointer to a tempo map that the thread can safely use for any purpose, except modifying the map (see below).

Modify the tempo map

The tempo map is a thread-local value that is managed using RCU (read-copy-update). To modify it, a thread must obtain a copy of the map, make changes to it, and then request an update to the "canonical" version of the map (which will fail if the map has been modified since the copy was made).

Call TempoMap::fetch_writable() within the thread that will do the update.
Make modifications to the tempo map accessed via TempoMap::use(), all from thread that called fetch_writable()
When finished, call TempoMap::update (TempoMap::use()), from the thread that called fetch_writable()
If you decide to abandon the changes, be sure to call TempoMap::abort_update(), from the thread that called fetch_writable() Failure to do so will likely cause crashes.

These methods are all thread-safe: multiple threads can access them simultaneously. Note however that if there are simultaneous modifications to the tempo map, only the first call to TempoMap::update() will succeed. Updaters should verify that the update succeeded, although in current code (early 2021), retrying the modifications is difficult.

Ensuring tempo map sanity

Because the tempo map is a thread local value, any thread that might use the tempo map must ensure that the value is maintained. All control surface threads, the main GUI and all process threads ensure that the tempo map pointer is updated at an appropriate time during their execution.

If you add new threads/new event loops, you must ensure that the thread/event loop calls TempoMap::update_thread_tempo_map() at suitable times. The correct place is normally at the very "top" of the event loop, before any input sources or events are handled. This will ensure that all input/event processing sees a single, consistent version of the map. (This is true even if the thread changes the map, btw).

The Big Questions

MIDI regions with audio time?
Audio regions with beat time?
Playlists with non-matching time domain?
When and how does the user change the canonical time domain of the tempo map?