Representing Time in Ardour
Within Ardour, there are two fundamentally different ways of specifying positions along a timeline, as well as distances between two such positions.
- Audio time: a linear, monotonic timebase. Units are called "superclocks", which provide a sample-rate independent means to specify time. There are 508032000 superclocks per second.
- Beat time: a linear, monotonic timebase. Units are beats, which are arbitrarily defined as 1/4 notes.
The choice to use 1/4 notes is arbitrary in that any other possible choice for the note length of a beat is a simple integer multiple (or fraction) of a 1/4 note, and thus has no impact on the semantics of "a beat". We could use whole notes as "beats", which would simply introduce a factor of 4 into all arithmetic relating "beats" to quarter notes. We choose 1/4 notes because this is conventional in many other DAWs, the SMF specification (to a an extent) and several other related areas.
Beat time and audio time are related to each other via a tempo map. In its simplest form, a tempo map is a data structure that defines a single tempo (number of beats per minute) and a single meter (number of beats per musical measure, or "bar"). Converting between beat time and audio time requires only the tempo.
However, in more complex tempo maps, there may be arbitrary numbers of different tempos placed at (somewhat) arbitrary positions and any or all of them may be ramped (they are not constant). This means that in general, the relationship between audio time and beat time is complex, and cannot be inferred from trivial arithmetical formulas. In particular, it means that the number of beats represented by a certain number of superclocks (and vice versa) may change along the timeline (because of elements in the tempo map changing the relationship).
Data Types
Fundamental Types
There are two fundamental types used to represent domain-specific values:
superclock_t
: a typedef forint64_t
, represents an audio time position counted in superclocks (i.e. sample-rate independent). This data type almost never appears outside of libtemporal, other than in serialization/deserialization code since we store audio time positions using this type. In a running instance of the program, we know the sample rate, and thus the overwhelmingly common type for audio time values issamplepos_t
.samplepos_t
: a typedef forint64_t
, represents a audio time position counted in samples (i.e. sample-rate dependent, unlike superclocks)samplecnt_t
: a typedef forint64_t
, represents a audio time distance counted in samples (i.e. sample-rate dependent, unlike superclocks).Beats
: a class that represents a music time position or distance counted in 1/4 notes from an implicit origin. It is a fixed point integer type, with 32 bits for the whole beat component, and 32 bits for the fractional beats, denominated byBeats::PPQN
(akaTemporal::ticks_per_beat
, set at 1920. All regular integer arithmetic operators are available, using both integers and Beats as arguments.
samplepos_t
and samplecnt_t
are essentially
interchangeable because audio time is linear and monotonic. The existence
of two different typedefs for the same underlying primitive type serves
just a syntatic purpose - making it clear(er) when we are referring to a
position or distance.
Distance versus Position
Because of the non-linear relationship between audio time and beat time, whenever we need to represent a distance, we also need to specify where the distance is located. It is not adequate to specify "4 beats" because the audio time duration corresponding to 4 beats may different at various points along the timeline. Similarly, we cannot know how many beats 20,000 samples represents unless we know where that sample duration is on the timeline. Consequently, all measures of distance must form a pair:
(distance, at-position)However, there is one special case which is widely used. That is the case where the distance is being measured from an implicit origin (the "zero" of the timeline, or graphically speaking, its left edge). When we talk about the position of things along the timeline (for example, regions), we are actually talking about the distance between their position and the origin. We therefore need only specify the first element of the pair above, because the "at-position" is implicit (zero).
Consequently, we have two basic time types present within the
program: timepos_t
and timecnt_t
. The
former is used to hold positions, that is, distances measured from
the timeline origin. The latter is used to hold distances measured from
arbitrary positions. In some contexts, it is more accurate to
label distances as "durations". Either way, timecnt_t
is a pair of values,
whereas a timepos_t
is a single value.
A different way to think about this question is in terms of the result of a subtraction operation. Suppose we wish to compute two values:
- the distance between two positions
- the position obtained when moving a certain distance earlier from another
distance = later_position - earlier_position; earlier = position - distance;A little thought will make it clear that the data type returned in the first line must be different from the data type returned in the second line, if we need to be able to (potentially) change time domains. Suppose that the
distance
variable was a simple scalar value,
say 2000 samples. How could we convert that to beats? We need to know where on the
timeline it is located so that we could check the tempo map and do the
conversion. The same is true if the scalar value was 19 beats, and we wanted
to convert to samples. To be able to convert distances between domains, we
need to know the position at which the distance occurs, and that means that in
general we need a pair of values as indicated above.
int62_t
Both timepos_t
and timecnt_t
rely on a
lower level data type called int62_t
. This is a
fundamental building block of Ardour's time representation, and is
at its core, just an int64_t
. Such a data type offers
us 63 bits to represent a numeric value, plus a single sign
bit. For int62_t
, we "steal" the most significant bit
position to act as flag for the time domain. If the flag is set
(i.e. the most significant bit is 1), then the value stored in the
remaining 62 bits is in the beat time domain. If the flag is not
set (the most significant bit is 0), then the value stored in the
remaining 62 bits is in the audio time domain). To avoid potential
threading issues, the 64 bit value is always handled using
atomic reads and writes, ensuring that the flag bit and the value
can never be modified independently. There is never any doubt about
the semantics of the value held - it is either an audio time value,
or a beat time value. Note that the implementation
of int62_t
does not use terminology related to time at all:
the flag bit is "just a flag bit", and doesn't have any inherent
semantics. Those are left for users and derivatives of this class.
Within the code, int62_t
is intended to function
identically to an int64_t in every way. All integer operations are
available and should work as expected. You may add, subtract,
divide, take the modulo of etc. etc. an int62_t
and the
result should match your intuition. Any cases in which this does not
work should be considered as implementation bugs.
One exception to this rule is that for obvious reasons, the range of
an int62_t
type is smaller than that of
an int64_t
. We revisit this issue below, because it
intersects with the question of the range of each time domain.
timepos_t
timepos_t
IS-A int62_t
, and extends its
parent class with a variety of methods related to time
domains. A timepos_t
can be constructed from a
samplepos_t (or samplecnt_t) representing an audio time value counted
in samples. Alternatively, it can be constructed using
a Beats
. In both cases, the value stored in the
int62_t underlying type is derived from the constructor argument,
and is not the literal argument. For samples, we convert to
superclocks. For Beats, we convert to "ticks" (beats * PPQN +
ticks
).
A timepos_t
can be used to represent distance also, as
long as all users of the value agree on the implicit (zero)
origin. A timepos_t
representing a distance of 12
beats, for example, implicitly means "12 beats from the timeline
origin". For example:
timepos_t twelve_beats_from_zero (Beats (12)); // compare with actual timecnt_t distance expressions timecnt_t also_twelve_beats_from_zero (Beats (12), timepos_t (BeatTime)); timecnt_t another_twelve_beats_from_zero (Beats (12)); // audio time domain zeroThe second argument argument in the second constructor denotes zero-in-beat-time, while the lack of a second argument in the third constructor denotes zero-in-audio-time. As stated above zero-in-beat-time and zero-in-audio-time should always be equivalent.
To construct a timepos_t
representing zero, we can just
call the constructor with no arguments. However, this has some danger
because this will use audio time by default, which may not be what
is intended. To counter this, the implementation
of timepos_t
attempts to treat zero in either time
domain as identical. This expression must (and does) evaluate to true:
// compare zero (audio time) with zero (beat time) timepos_t (AudioTime) == timepos_t (BeatTime)There is still room for danger here, however. Adding beat time value to zero-in-audio-time will generate an audio time value. This may not be what was intended.
Returning to the observations above about the difference between computing a
distance between two positions, or the result of shifting a position earlier
in time, we must note one very important restriction on timepos_t. With a
normal arithmetic type, we would compute either of these using
subtraction. But because (a) we need to return different types for each
operation and (b) we wish to be clear in the code about what we are actually
doing, timepos_t
has no accessible
operator-()
. You must use the distance (timepos_t
const &)
or earlier (timecnt_t const &)
methods to compute
either of these, as in:
timecnt_t four_beats (Beats (4)); // a distance of 4 beats, as measured at zero timepos_t pos (Beats (18)); // a position of 18 beats from the origin timepos_t earlier_pos = pos.earlier (four_beats); // 14 beats from origin timecnt_t distance = pos.distance (four_beats); // -14 beats at 18 beats from origin
timecnt_t
timecnt_t
HAS-A int62_t
member (distance), along with
a timepos_t
member (position). A timecnt_t
object
can be constructed in the following ways:
timecnt_t zero_audio; // zero distance, in audio time timecnt_t zero_beats (BeatTime); // argument specifies time domain timecnt_t about_a_second (48000); // 48k samples, from zero origin timecnt_t about_a_second_at_10_beats (48000, Beats(10)); // 48k samples at 10th beat timecnt_t four_beats_at_about_two_seconds (Beats(4), 88100);
Coding Guidelines
What Type to Use?
- for a variable/member that will ONLY contain audio time, and is
never serialized to disk, use
samplepos_t
orsamplecnt_t
for position or distance/duration, respectively. - For a variable/member that will ONLY contain beat time,
use
Beats
. - For a variable/member that represents a position on the timeline,
use
timepos_t
. - For a variable/member that represents a distance/duration along the
timeline, use
timecnt_t
. - If considering using
superclock_t
, think long and hard. It's probably not what you want. - If considering using
int62_t
, you've made a mistake.
Based on the rules above, we can note that for instance, the code dealing with the fade in/out on an audio region uses samplepos_t/samplecnt_t. These fades have a duration that only makes sense to define in terms of audio time.
Using timeline types and time domains
-
Arithmetic operations do not change the time domain of the result. Adding
a value in any time domain to an audio time value will always result in
another audio time value. Subtracting a value in any time domain from a beat
time value will always result in another beat time value. This applies to
all arithmetic operators. The consequence of this is that if
a
timepos_t
is created or assigned a value in the "wrong" time domain, no arithmetic operations will alter that time domain.
How To ... ?
- Convert a timeline position from one time domain to another
- Use methods of
timepos_t
(e.g.samples()
,beats()
) rather than asking the tempo map directly - Convert a timeline distance/duration from one time domain to another
- Use methods of
timecnt_t
(e.g.samples()
,beats()
) rather than asking the tempo map directly - Access the tempo map
-
The tempo map is a "thread local" value - it may have different values in
different threads. Any thread may obtain its current tempo map by
calling
TempoMap::use()
(this returns a shared pointer to a tempo map that the thread can safely use for any purpose, except modifying the map (see below). - Modify the tempo map
-
The tempo map is a thread-local value that is managed using RCU (read-copy-update). To modify it, a thread must obtain a copy of the map, make changes to it, and then request an update to the "canonical" version of the map (which will fail if the map has been modified since the copy was made).
- Call
TempoMap::fetch_writable()
within the thread that will do the update. - Make modifications to the tempo map accessed
via
TempoMap::use()
, all from thread that calledfetch_writable()
- When finished, call
TempoMap::update (TempoMap::use())
, from the thread that calledfetch_writable()
- If you decide to abandon the changes, be sure to
call
TempoMap::abort_update()
, from the thread that calledfetch_writable()
Failure to do so will likely cause crashes.
TempoMap::update()
will succeed. Updaters should verify that the update succeeded, although in current code (early 2021), retrying the modifications is difficult. - Call
- Ensuring tempo map sanity
Because the tempo map is a thread local value, any thread that might use the tempo map must ensure that the value is maintained. All control surface threads, the main GUI and all process threads ensure that the tempo map pointer is updated at an appropriate time during their execution.
If you add new threads/new event loops, you must ensure that the thread/event loop calls
TempoMap::update_thread_tempo_map()
at suitable times. The correct place is normally at the very "top" of the event loop, before any input sources or events are handled. This will ensure that all input/event processing sees a single, consistent version of the map. (This is true even if the thread changes the map, btw).
The Big Questions
- MIDI regions with audio time?
- Audio regions with beat time?
- Playlists with non-matching time domain?
- When and how does the user change the canonical time domain of the tempo map?