On Encodings

Status: Proposal (as of 17.01.2024)

Those encodings referenced from the Meadowcap specification have status Candidate.

A perhaps curious feature of the Willow data model is that its specifications rarely talk about encodings. We11Let’s be honest: Aljoscha strongly believe that specifications should concern themselves with purely logical data types as long as possible, treating encodings as a minor and ultimately interchangeable detail. When specifications define concepts in terms of their encodings, results usually end up miserably underspecified (see JSON) or full of incidental complexity (see XML).

Nevertheless, protocols that deal with persistent storage and network transmissions eventually have to serialise data. In this document, we give both some generic definitions around arbitrary encodings, and some specific encodings that recur throughout the Willow family of specifications.

Encodings, In Detail

The Willow protocols are generic over user-supplied parameters. Invariably, some values of the user-supplied types need to be encoded, so there also have to be user-supplied definitions of how to encode things. But not every function that turns values into byte strings defines an encoding.

Hence, we formally define what we mean by encodings. We first give a succinct, rather mathematical definition, followed by a more accessible English explanation.

An encoding function encode_s for a set S is a function from S to the set of bytestrings, such that there exists a function decode_s such that:

for every s in S and every bytestring b that starts with encode_s(s), we have $d eco d e_s (b) = s,$ and
for every s in S and every bytestring b that does not start with encode_s(s), we have $d eco d e_s (b) \neq = s .$

In plain language:

encode_s must deterministically map any value of type S to exactly one bytestring.
Input Output Correctness
“turtle”
“turtle” | “tortoise”
Desired behaviour for encode_animal.
Further, there must be a corresponding decoding function decode_s. This function must map valid encodings back to values of type S, and it must report a decoding error for any input string that does not start with a valid encoding of any value of type S.
Input Output Correctness
“turtle”
“tortoise”
“turtle”
“tortoise”
Desired behaviour for decode_animal.
And finally, appending arbitrary bytes to a valid encoding must not cause any change in the return value of decode_s.This requirement makes it so we can encode sequences of multiple values by simply concatenating their encodings, without running the risk that the resulting string would be a valid encoding of a different value as well.
Input Output Correctness
“turtle”
“turtleneck”
“turtle”
“turtleneck”
“turtle”
“turtleneck”
Desired behaviour for decode_animal.

Encoding Techniques

We now prepare to define some specific encodings for several datatypes of Willow.

When designing encodings, there rarely exists a single best solution. Different scenarios might warrant different trade-offs between encoding and decoding time, encoding size, or simplicity. Furthermore, additional context can often enable more efficient encodings than general-purpose solutions. For example, a database that groups Entries by namespace_id would not need to encode the namespace_id of every individual Entry.

For these reasons, we encourage you to take the following suggested encodings with a grain of salt, and to look for contextual information that you could leverage to obtain even more efficient encodings for your specific use cases.

To encode small numbers with fewer bytes than large numbers, we define for any U64 n the value compact_width(n) as

$1,$ if $n < 256,$ or
$2,$ if $n < 25 6^{2},$ or
$4,$ if $n < 25 6^{4},$ or
$8,$ otherwise.

Data Model Encodings

When encoding Paths, we want to use fixed-width unsigned integers of minimal width. Hence, we define:

path_length_power is the least natural number such that $25 6^{p a t h_l e n g t h_p o w er} \geq ma x_co m p o n e n t_l e n g t h .$ We can represent the length of any Component in path_length_power bytes. UPathLengthPower denotes the type of numbers between zero (inclusive) and $25 6^{p a t h_l e n g t h_p o w er}$ (exclusive).

path_count_power is the least natural number such that $25 6^{p a t h_co u n t_p o w er} \geq ma x_co m p o n e n t_co u n t .$ We can represent the number of Components of any Path in path_count_power bytes. UPathCountPower denotes the type of numbers between zero (inclusive) and $25 6^{p a t h_co u n t_p o w er}$ (exclusive).

`encode_path`

To encode a Path p, we define encode_path(p) as the concatenation of

the number of Components of p, encoded as a big-endian UPathCountPower,
for each Component of p,
- the length of the Component, encoded as a big-endian UPathLengthPower, followed by the bytes of the Component.

`encode_entry`

To encode an Entry e, let encode_namespace_id be an encoding function for NamespaceId, let encode_subspace_id be an encoding function for SubspaceId, and let encode_payload_digest be an encoding function for PayloadDigest. We then define encode_entry(e) as the concatenation of:

`encode_namespace_id(e.namespace_id)`
`encode_subspace_id(e.subspace_id)`
`encode_path(e.path)`
e.timestamp, encoded as a big-endian U64
e.payload_length, encoded as a big-endian U64
`encode_payload_digest(e.payload_digest)`

Relativity

When encoding Willow objects, we can often achieve smaller encoding sizes by encoding how some object differs from another. In this section, we define several such relative encodings.

In all subsequent definitions, whenever the value open is part of a numeric computation or comparison, it should be treated as a very large number, say, $2^{9999} .$ The definitions all ensure that the resulting values never have to be encoded.

`encode_path_relative_path`

To encode a Path p relative to a reference Path ref, we define encode_path_relative_path(p, ref) as the concatenation of:

The length of the longest common prefix of p and ref, encoded as a UPathCountPower.
encode_path of the Path obtained by removing that longest common prefix from p.

`encode_entry_relative_entry`

To encode an Entry e relative to a reference Entry ref, we first define time_diff as the absolute value of e.timestamp - ref.timestamp. We then define encode_entry_relative_entry(e, ref) as the concatenation of:

Bit	Big-endian bitfield
0	Encode e.namespace_id?`1` iff `e.namespace_id != ref.namespace_id`
1	Encode e.subspace_id?`1` iff `e.subspace_id != ref.subspace_id`
2	Add or subtract time_diff from ref.timestamp?`1` iff `e.timestamp - ref.timestamp > 0`
3	always `0`
4, 5	2-bit integer `n` such that `2^n` gives `compact_width(time_diff)` Bit 4 is `1` iff `compact_width(time_diff)` is `4` or `8`. Bit 5 is `1` iff `compact_width(time_diff)` is `2` or `8`.
6, 7	2-bit integer `n` such that `2^n` gives `compact_width(e.payload_length)` Bit 6 is `1` iff `compact_width(e.payload_length)` is `4` or `8`. Bit 7 is `1` iff `compact_width(e.payload_length)` is `2` or `8`.
`encode_namespace_id(e.namespace_id)`, or the empty string, if `e.namespace_id == ref.namespace_id`
`encode_subspace_id(e.subspace_id)`, or the empty string, if `e.subspace_id == ref.subspace_id`
`encode_path_relative_path(e.path, ref.path)`
time_diff, encoded as an unsigned, big-endian `compact_width(time_diff)`-byte integer
e.payload_length, encoded as an unsigned, big-endian `compact_width(e.payload_length)`-byte integer
`encode_payload_digest(e.payload_digest)`

`encode_entry_in_namespace_area`

To encode an Entry e that is included in an outer Area out in a namespace of NamespaceId namespace_id, we first define time_diff as the minimum of e.timestamp - out.times.start and out.times.end - e.timestamp. We then define encode_entry_in_namespace_area(e, out, namespace_id) as the concatenation of:

Bit	Big-endian bitfield
0	Encode e.subspace_id?`1` iff `out.subspace_id == any`
1	Add time_diff to out.times.start, or subtract from out.times.end?`1` iff `e.timestamp - out.times.start <= out.times.end - e.timestamp`
2, 3	2-bit integer `n` such that `2^n` gives `compact_width(time_diff)` Bit 2 is `1` iff `compact_width(time_diff)` is `4` or `8`. Bit 3 is `1` iff `compact_width(time_diff)` is `2` or `8`.
4, 5	2-bit integer `n` such that `2^n` gives `compact_width(e.payload_length)` Bit 4 is `1` iff `compact_width(e.payload_length)` is `4` or `8`. Bit 5 is `1` iff `compact_width(e.payload_length)` is `2` or `8`.
6, 7	always `0`
`encode_subspace_id(e.subspace_id)`, or the empty string, if `out.subspace_id != any`
`encode_path_relative_path(e.path, out.path)`
time_diff, encoded as an unsigned, big-endian `compact_width(time_diff)`-byte integer
e.payload_length, encoded as an unsigned, big-endian `compact_width(e.payload_length)`-byte integer
`encode_payload_digest(e.payload_digest)`

`encode_entry_in_namespace_3drange`

To encode an Entry e that is included in a 3dRange out in a namespace of NamespaceId namespace_id, we first define time_diff as the minimum absolute value of e.timestamp - out.times.start and e.timestamp - out.times.end. We then define encode_entry_in_namespace_3drange(e, out, namespace_id) as the concatenation of:

Bit	Big-endian bitfield
0	Encode e.subspace_id?`1` iff `e.subspace_id != out.subspaces.start`
1	Encode e.path relative to out.paths.start or to out.paths.end?`1` iff the longest common prefix of e.path and out.paths.start is at least as long as the longest common prefix of e.path and out.paths.end
2	Add time_diff to out.times.start, or subtract it from out.times.end?`1` iff `time_diff == abs(e.timestamp - out.times.start)`
3	always `0`
4, 5	2-bit integer `n` such that `2^n` gives `compact_width(time_diff)` Bit 4 is `1` iff `compact_width(time_diff)` is `4` or `8`. Bit 5 is `1` iff `compact_width(time_diff)` is `2` or `8`.
6, 7	2-bit integer `n` such that `2^n` gives `compact_width(e.payload_length)` Bit 6 is `1` iff `compact_width(e.payload_length)` is `4` or `8`. Bit 7 is `1` iff `compact_width(e.payload_length)` is `2` or `8`.
`encode_subspace_id(e.subspace_id)`, or the empty string, if `e.subspace_id == out.subspaces.start`
`encode_path_relative_path(e.path, out.paths.start)` if the longest common prefix of e.path and out.paths.start is at least as long as the longest common prefix of e.path and out.paths.end, otherwise `encode_path_relative_path(e.path, out.paths.end)`
time_diff, encoded as an unsigned, big-endian `compact_width(time_diff)`-byte integer
e.payload_length, encoded as an unsigned, big-endian `compact_width(e.payload_length)`-byte integer
`encode_payload_digest(e.payload_digest)`

`encode_area_in_area`

To encode an Area a that is included in an outer Area out, we first define start_diff as the minimum of a.times.start - out.times.start and out.times.end - a.times.start, and we define end_diff as the minimum of a.times.end - out.times.start and out.times.end - a.times.end. We then define encode_area_in_area(a, out) as the concatenation of:

Bit	Big-endian bitfield
0	Encode a.subspace_id?`1` iff `a.subspace_id != out.subspace_id`
1	Encode end value of a.times?`1` iff `a.times.end == open`
2	Add start_diff to out.times.start, or subtract from out.times.end?`1` iff `start_diff == a.times.start - out.times.start`
3	Add end_diff to out.times.start, or subtract from out.times.end?`0` if `a.times.end == open`, otherwise `1` iff `end_diff == a.times.end - out.times.start`
4, 5	2-bit integer `n` such that `2^n` gives `compact_width(start_diff)` Bit 4 is `1` iff `compact_width(start_diff)` is `4` or `8`. Bit 5 is `1` iff `compact_width(start_diff)` is `2` or `8`.
6, 7	2-bit integer `n` such that `2^n` gives `compact_width(end_diff)` Bit 6 is `1` iff `compact_width(end_diff)` is `4` or `8`. Bit 7 is `1` iff `compact_width(end_diff)` is `2` or `8`.
`encode_subspace_id(a.subspace_id)`, or the empty string, if `a.subspace_id == out.subspace_id`
`encode_path_relative_path(a.path, out.path)`
start_diff, encoded as an unsigned, big-endian `compact_width(start_diff)`-byte integer
end_diff, encoded as an unsigned, big-endian `compact_width(end_diff)`-byte integer, or the empty string, if `a.times.end == open`

`encode_3drange_relative_3drange`

To encode a 3dRange ran relative to a reference 3dRange ref, we first define

start_to_start as the absolute value of ran.times.start - ref.times.start,
start_to_end as the absolute value of ran.times.start - ref.times.end,
end_to_start as the absolute value of ran.times.end - ref.times.start,
end_to_end as the absolute value of ran.times.end - ref.times.end,
start_time_diff as the minimum of start_to_start and start_to_end, and
end_time_diff as the minimum of end_to_start and end_to_end.

We then define encode_3drange_relative_3drange(ran, ref) as the concatenation of:

Bit	Big-endian bitfield
0, 1	Encode ran.subspaces.start? `01` if `ran.subspaces.start == ref.subspaces.start`, `10` if `ran.subspaces.start == ref.subspaces.end`, `11` otherwise.
2, 3	Encode ran.subspaces.end? `00` if `ran.subspaces.end == open`, and else `01` if `ran.subspaces.end == ref.subspaces.start`, `10` if `ran.subspaces.end == ref.subspaces.end`, `11` otherwise.
4	Encode ran.paths.start relative to ref.paths.start or to ref.paths.end?`1` iff the longest common prefix of ran.paths.start and ref.paths.start is at least as long as the longest common prefix of ran.paths.start and ref.paths.end
5	`1` iff `ran.paths.end == open`
6	Encode ran.paths.end relative to ref.paths.start or to ref.paths.end (if at all)? `0` if `ran.paths.end == open`, otherwise `1` iff the longest common prefix of ran.paths.end and ref.paths.start is at least as long as the longest common prefix of ran.paths.end and ref.paths.end
7	`1` iff `ran.times.end == open`
8	Encode ran.times.start relative to ref.times.start or ref.times.end?`1` iff `start_to_start <= start_to_end`
9	Add or subtract start_time_diff?`1` iff bit 8 is `1` and `ran.times.start >= ref.times.start`, or bit 8 is `0` and `ran.times.start >= ref.times.end`.
10, 11	2-bit integer `n` such that `2^n` gives `compact_width(start_time_diff)` Bit 10 is `1` iff `compact_width(start_time_diff)` is `4` or `8`. Bit 11 is `1` iff `compact_width(start_time_diff)` is `2` or `8`.
12	Encode ran.times.end relative to ref.times.start or ref.times.end (if at all)? `0` if `ran.times.end == open`, otherwise `1` iff `end_to_start <= end_to_end`
13	Add or subtract end_time_diff (if encoding it at all)? `0` if `ran.times.end == open`, otherwise `1` iff bit 12 is `1` and `ran.times.end >= ref.times.start`, or bit 12 is `0` and `ran.times.end >= ref.times.end`.
14, 15	ignored, or 2-bit integer `n` such that `2^n` gives `compact_width(end_time_diff)` `00` if `ran.times.end == open`, otherwise: Bit 14 is `1` iff `compact_width(end_time_diff)` is `4` or `8`. Bit 15 is `1` iff `compact_width(end_time_diff)` is `2` or `8`.
`encode_subspace_id(ran.subspaces.start)`, or the empty string if `ran.subspaces.start == ref.subspaces.start` or `ran.subspaces.start == ref.subspaces.end`
`encode_subspace_id(ran.subspaces.end)`, or the empty string if `ran.subspaces.end == open`, `ran.subspaces.end == ref.subspaces.start` or `ran.subspaces.end == ref.subspaces.end`
`encode_path_relative_path(ran.paths.start, ref.paths.start)` if the longest common prefix of ran.paths.start and ref.paths.start is at least as long as the longest common prefix of ran.paths.start and ref.paths.end, otherwise `encode_path_relative_path(ran.paths.start, ref.paths.end)`
the empty string if `ran.paths.end == open`, otherwise: `encode_path_relative_path(ran.paths.end, ref.paths.start)` if the longest common prefix of ran.paths.end and ref.paths.start is at least as long as the longest common prefix of ran.paths.end and ref.paths.end, otherwise `encode_path_relative_path(ran.paths.end, ref.paths.end)`
start_time_diff, encoded as an unsigned, big-endian `compact_width(start_time_diff)`-byte integer
end_time_diff, encoded as an unsigned, big-endian `compact_width(end_time_diff)`-byte integer, or the empty string, if `end_time_diff.times.end == open`

Capabilities

Encodings for Meadowcap and McSubspaceCapabilities.

`encode_subspace_capability`

To encode a McSubspaceCapability c, we define encode_mc_subspace_capability(c) as the concatenation of:

Bit	Big-endian bitfield
0 – 7	`11111111` if the length of c.delegations is greater or equal to 2^32, `11111110` if the length of c.delegations is greater or equal to 2^16, `11111101` if the length of c.delegations is greater or equal to 256, `11111100` if the length of c.delegations is greater or equal to 252, or the length of c.delegations otherwise.
`encode_namespace_pk(c.namespace_key)`
`encode_user_pk(c.user_key)`
`encode_namespace_sig(c.initial_authorisation)`
the length of c.delegations, encoded as an unsigned, big-endian `compact_width(the length of c.delegations)`-byte integer, or the empty string, if the length of c.delegations is less than or equal to 251
for each pair `(pk, sig)` in c.delegations the concatenation of: `encode_user_pk(pk)` `encode_user_sig(sig)`

`encode_mc_capability`

To encode a McCapability c whose granted area is know to be included in some22If no smaller containing Area is known from context, use the full area. Area out, we define encode_mc_capability(c) depending on whether c.inner is a CommunalCapability or an OwnedCapability.

If c.inner is a CommunalCapability, then encode_mc_capability(c) is the concatenation of:

Bit	Big-endian bitfield
0, 1	`00`, if `c.inner.access_mode == read`, `01` otherwise.
2 – 7	`111111` if the length of c.inner.delegations is greater or equal to 2^32, `111110` if the length of c.inner.delegations is greater or equal to 2^16, `111101` if the length of c.inner.delegations is greater or equal to 256, `111100` if the length of c.inner.delegations is greater or equal to 60, or the length of c.inner.delegations otherwise.
`encode_namespace_pk(c.inner.namespace_key)`
`encode_user_pk(c.inner.user_key)`
the length of c.inner.delegations, encoded as an unsigned, big-endian `compact_width(the length of c.inner.delegations)`-byte integer, or the empty string, if the length of c.inner.delegations is less than or equal to 59
for each triplet `(area, pk, sig)` in c.inner.delegations the concatenation of: `encode_area_in_area(area, prev_area)`, where prev_area is the area of the previous triplet, or out for the first triplet. `encode_user_pk(pk)` `encode_user_sig(sig)`

If c.inner is an OwnedCapability, then encode_mc_capability(c) is the concatenation of:

Bit	Big-endian bitfield
0, 1	`10`, if `c.inner.access_mode == read`, `11` otherwise.
2 – 7	`111111` if the length of c.inner.delegations is greater or equal to 2^32, `111110` if the length of c.inner.delegations is greater or equal to 2^16, `111101` if the length of c.inner.delegations is greater or equal to 256, `111100` if the length of c.inner.delegations is greater or equal to 60, or the length of c.inner.delegations otherwise.
`encode_namespace_pk(c.inner.namespace_key)`
`encode_user_pk(c.inner.user_key)`
`encode_namespace_sig(c.inner.initial_authorisation)`
the length of c.inner.delegations, encoded as an unsigned, big-endian `compact_width(the length of c.inner.delegations)`-byte integer, or the empty string, if the length of c.inner.delegations is less than or equal to 59
for each triplet `(area, pk, sig)` in c.inner.delegations the concatenation of: `encode_area_in_area(area, prev_area)`, where prev_area is the area of the previous triplet, or out for the first triplet. `encode_user_pk(pk)` `encode_user_sig(sig)`

Input	Output	Correctness
`“turtle”`
`“tortoise”`
`“turtle”`
`“tortoise”`

Input	Output	Correctness
`“turtle”`
`“turtleneck”`
`“turtle”`
`“turtleneck”`
`“turtle”`
`“turtleneck”`