On Encodings
Status: Proposal (as of 17.01.2024)
Those encodings referenced from the Meadowcap specification have status Candidate.A perhaps curious feature of the Willow data model is that its specifications rarely talk about encodings. We11Let’s be honest: Aljoscha strongly believe that specifications should concern themselves with purely logical data types as long as possible, treating encodings as a minor and ultimately interchangeable detail. When specifications define concepts in terms of their encodings, results usually end up miserably underspecified (see JSON) or full of incidental complexity (see XML).
Nevertheless, protocols that deal with persistent storage and network transmissions eventually have to serialise data. In this document, we give both some generic definitions around arbitrary encodings, and some specific encodings that recur throughout the Willow family of specifications.
Encodings, In Detail
The Willow protocols are generic over usersupplied parameters. Invariably, some values of the usersupplied types need to be encoded, so there also have to be usersupplied definitions of how to encode things. But not every function that turns values into byte strings defines an encoding.
Hence, we formally define what we mean by encodings. We first give a succinct, rather mathematical definition, followed by a more accessible English explanation.
An encoding function encode_s for a set S is a function from S to the set of bytestrings, such that there exists a function decode_s such that:
 for every s in S and every bytestring b that starts with
encode_s(s)
, we have $decode_s(b)=s,$ and  for every s in S and every bytestring b that does not start with
encode_s(s)
, we have $decode_s(b)=s.$
In plain language:
 encode_s must deterministically map any value of type S to exactly one bytestring.
Input Output Correctness “turtle”
“turtle”  “tortoise”
Desired behaviour for encode_animal.  Further, there must be a corresponding decoding function decode_s. This function must map valid encodings back to values of type S, and it must report a decoding error for any input string that does not start with a valid encoding of any value of type S.
Input Output Correctness “turtle”
“tortoise”
“turtle”
“tortoise”
Desired behaviour for decode_animal.  And finally, appending arbitrary bytes to a valid encoding must not cause any change in the return value of decode_s.This requirement makes it so we can encode sequences of multiple values by simply concatenating their encodings, without running the risk that the resulting string would be a valid encoding of a different value as well.
Input Output Correctness “turtle”
“turtleneck”
“turtle”
“turtleneck”
“turtle”
“turtleneck”
Desired behaviour for decode_animal.
Encoding Techniques
We now prepare to define some specific encodings for several datatypes of Willow.
When designing encodings, there rarely exists a single best solution. Different scenarios might warrant different tradeoffs between encoding and decoding time, encoding size, or simplicity. Furthermore, additional context can often enable more efficient encodings than generalpurpose solutions. For example, a database that groups Entries by namespace_id would not need to encode the namespace_id of every individual Entry.
For these reasons, we encourage you to take the following suggested encodings with a grain of salt, and to look for contextual information that you could leverage to obtain even more efficient encodings for your specific use cases.
To encode small numbers with fewer bytes than large numbers, we define for any U64 n the value compact_width(n)
as
 $1,$ if $n<256,$ or
 $2,$ if $n<256_{2},$ or
 $4,$ if $n<256_{4},$ or
 $8,$ otherwise.
Data Model Encodings
When encoding Paths, we want to use fixedwidth unsigned integers of minimal width. Hence, we define:
path_length_power is the least natural number such that $256_{path_length_power}≥max_component_length.$ We can represent the length of any Component in path_length_power bytes. UPathLengthPower denotes the type of numbers between zero (inclusive) and $256_{path_length_power}$ (exclusive).
path_count_power is the least natural number such that $256_{path_count_power}≥max_component_count.$ We can represent the number of Components of any Path in path_count_power bytes. UPathCountPower denotes the type of numbers between zero (inclusive) and $256_{path_count_power}$ (exclusive).
encode_path
To encode a Path p, we define encode_path(p)
as the concatenation of
 the number of Components of p, encoded as a bigendian UPathCountPower,
 for each Component of p,
 the length of the Component, encoded as a bigendian UPathLengthPower, followed by the bytes of the Component.
encode_entry
To encode an Entry e, let encode_namespace_id be an encoding function for NamespaceId, let encode_subspace_id be an encoding function for SubspaceId, and let encode_payload_digest be an encoding function for PayloadDigest. We then define encode_entry(e)
as the concatenation of:
encode_namespace_id(e.namespace_id)  
encode_subspace_id(e.subspace_id)  
encode_path(e.path)  
e.timestamp, encoded as a bigendian U64  
e.payload_length, encoded as a bigendian U64  
encode_payload_digest(e.payload_digest) 
Relativity
When encoding Willow objects, we can often achieve smaller encoding sizes by encoding how some object differs from another. In this section, we define several such relative encodings.
In all subsequent definitions, whenever the value open is part of a numeric computation or comparison, it should be treated as a very large number, say, $2_{9999}.$ The definitions all ensure that the resulting values never have to be encoded.
encode_path_relative_path
To encode a Path p relative to a reference Path ref, we define encode_path_relative_path(p, ref)
as the concatenation of:
The length of the longest common prefix of p and ref, encoded as a UPathCountPower.  
encode_path of the Path obtained by removing that longest common prefix from p. 
encode_entry_relative_entry
To encode an Entry e relative to a reference Entry ref, we first define time_diff as the absolute value of e.timestamp  ref.timestamp
. We then define encode_entry_relative_entry(e, ref)
as the concatenation of:
Bit  Bigendian bitfield 

0  Encode e.namespace_id?1 iff e.namespace_id != ref.namespace_id 
1  Encode e.subspace_id?1 iff e.subspace_id != ref.subspace_id 
2  Add or subtract time_diff from ref.timestamp?1 iff e.timestamp  ref.timestamp > 0 
3  always 0 
4, 5  2bit integer n such that 2^n gives compact_width(time_diff) 
6, 7  2bit integer n such that 2^n gives compact_width(e.payload_length) 
encode_namespace_id(e.namespace_id) , or the empty string, if e.namespace_id == ref.namespace_id  
encode_subspace_id(e.subspace_id) , or the empty string, if e.subspace_id == ref.subspace_id  
encode_path_relative_path(e.path, ref.path)  
time_diff, encoded as an unsigned, bigendian compact_width(time_diff) byte integer  
e.payload_length, encoded as an unsigned, bigendian compact_width(e.payload_length) byte integer  
encode_payload_digest(e.payload_digest) 
encode_entry_in_namespace_area
To encode an Entry e that is included in an outer Area out in a namespace of NamespaceId namespace_id, we first define time_diff as the minimum of e.timestamp  out.times.start
and out.times.end  e.timestamp
. We then define encode_entry_in_namespace_area(e, out, namespace_id)
as the concatenation of:
Bit  Bigendian bitfield 

0  Encode e.subspace_id?1 iff out.subspace_id == any 
1  Add time_diff to out.times.start, or subtract from out.times.end?1 iff e.timestamp  out.times.start <= out.times.end  e.timestamp 
2, 3  2bit integer n such that 2^n gives compact_width(time_diff) 
4, 5  2bit integer n such that 2^n gives compact_width(e.payload_length) 
6, 7  always 0 
encode_subspace_id(e.subspace_id) , or the empty string, if out.subspace_id != any  
encode_path_relative_path(e.path, out.path)  
time_diff, encoded as an unsigned, bigendian compact_width(time_diff) byte integer  
e.payload_length, encoded as an unsigned, bigendian compact_width(e.payload_length) byte integer  
encode_payload_digest(e.payload_digest) 
encode_entry_in_namespace_3drange
To encode an Entry e that is included in a 3dRange out in a namespace of NamespaceId namespace_id, we first define time_diff as the minimum absolute value of e.timestamp  out.times.start
and e.timestamp  out.times.end
. We then define encode_entry_in_namespace_3drange(e, out, namespace_id)
as the concatenation of:
Bit  Bigendian bitfield 

0  Encode e.subspace_id?1 iff e.subspace_id != out.subspaces.start 
1  Encode e.path relative to out.paths.start or to out.paths.end?1 iff the longest common prefix of e.path and out.paths.start is at least as long as the longest common prefix of e.path and out.paths.end 
2  Add time_diff to out.times.start, or subtract it from out.times.end?1 iff time_diff == abs(e.timestamp  out.times.start) 
3  always 0 
4, 5  2bit integer n such that 2^n gives compact_width(time_diff) 
6, 7  2bit integer n such that 2^n gives compact_width(e.payload_length) 
encode_subspace_id(e.subspace_id) , or the empty string, if e.subspace_id == out.subspaces.start  
encode_path_relative_path(e.path, out.paths.start) if the longest common prefix of e.path and out.paths.start is at least as long as the longest common prefix of e.path and out.paths.end, otherwise encode_path_relative_path(e.path, out.paths.end)  
time_diff, encoded as an unsigned, bigendian compact_width(time_diff) byte integer  
e.payload_length, encoded as an unsigned, bigendian compact_width(e.payload_length) byte integer  
encode_payload_digest(e.payload_digest) 
encode_area_in_area
To encode an Area a that is included in an outer Area out, we first define start_diff as the minimum of a.times.start  out.times.start
and out.times.end  a.times.start
, and we define end_diff as the minimum of a.times.end  out.times.start
and out.times.end  a.times.end
. We then define encode_area_in_area(a, out)
as the concatenation of:
Bit  Bigendian bitfield 

0  Encode a.subspace_id?1 iff a.subspace_id != out.subspace_id 
1  Encode end value of a.times?1 iff a.times.end == open 
2  Add start_diff to out.times.start, or subtract from out.times.end?1 iff start_diff == a.times.start  out.times.start 
3  Add end_diff to out.times.start, or subtract from out.times.end?0 if a.times.end == open , otherwise 1 iff end_diff == a.times.end  out.times.start 
4, 5  2bit integer n such that 2^n gives compact_width(start_diff) 
6, 7  2bit integer n such that 2^n gives compact_width(end_diff) 
encode_subspace_id(a.subspace_id) , or the empty string, if a.subspace_id == out.subspace_id  
encode_path_relative_path(a.path, out.path)  
start_diff, encoded as an unsigned, bigendian compact_width(start_diff) byte integer  
end_diff, encoded as an unsigned, bigendian compact_width(end_diff) byte integer, or the empty string, if a.times.end == open 
encode_3drange_relative_3drange
To encode a 3dRange ran relative to a reference 3dRange ref, we first define
 start_to_start as the absolute value of
ran.times.start  ref.times.start
,  start_to_end as the absolute value of
ran.times.start  ref.times.end
,  end_to_start as the absolute value of
ran.times.end  ref.times.start
,  end_to_end as the absolute value of
ran.times.end  ref.times.end
,  start_time_diff as the minimum of start_to_start and start_to_end, and
 end_time_diff as the minimum of end_to_start and end_to_end.
encode_3drange_relative_3drange(ran, ref)
as the concatenation of:Bit  Bigendian bitfield 

0, 1  Encode ran.subspaces.start?11 otherwise. 
2, 3  Encode ran.subspaces.end?11 otherwise. 
4  Encode ran.paths.start relative to ref.paths.start or to ref.paths.end?1 iff the longest common prefix of ran.paths.start and ref.paths.start is at least as long as the longest common prefix of ran.paths.start and ref.paths.end 
5  1 iff ran.paths.end == open 
6  Encode ran.paths.end relative to ref.paths.start or to ref.paths.end (if at all)? 
7  1 iff ran.times.end == open 
8  Encode ran.times.start relative to ref.times.start or ref.times.end?1 iff start_to_start <= start_to_end 
9  Add or subtract start_time_diff?1 iff bit 8 is 1 and ran.times.start >= ref.times.start , or bit 8 is 0 and ran.times.start >= ref.times.end . 
10, 11  2bit integer n such that 2^n gives compact_width(start_time_diff) 
12  Encode ran.times.end relative to ref.times.start or ref.times.end (if at all)? 
13  Add or subtract end_time_diff (if encoding it at all)? 
14, 15  ignored, or 2bit integer n such that 2^n gives compact_width(end_time_diff) 
encode_subspace_id(ran.subspaces.start) , or the empty string if ran.subspaces.start == ref.subspaces.start or ran.subspaces.start == ref.subspaces.end  
encode_subspace_id(ran.subspaces.end) , or the empty string if ran.subspaces.end == open , ran.subspaces.end == ref.subspaces.start or ran.subspaces.end == ref.subspaces.end  
encode_path_relative_path(ran.paths.start, ref.paths.start) if the longest common prefix of ran.paths.start and ref.paths.start is at least as long as the longest common prefix of ran.paths.start and ref.paths.end, otherwise encode_path_relative_path(ran.paths.start, ref.paths.end)  
start_time_diff, encoded as an unsigned, bigendian compact_width(start_time_diff) byte integer  
end_time_diff, encoded as an unsigned, bigendian compact_width(end_time_diff) byte integer, or the empty string, if end_time_diff.times.end == open 
Capabilities
Encodings for Meadowcap and McSubspaceCapabilities.
encode_subspace_capability
To encode a McSubspaceCapability c, we define encode_mc_subspace_capability(c)
as the concatenation of:
Bit  Bigendian bitfield 

0 – 7  
encode_namespace_pk(c.namespace_key)  
encode_user_pk(c.user_key)  
encode_namespace_sig(c.initial_authorisation)  
the length of c.delegations, encoded as an unsigned, bigendian compact_width(the length of c.delegations) byte integer, or the empty string, if the length of c.delegations is less than or equal to 251  
for each pair (pk, sig) in c.delegations the concatenation of: 
encode_mc_capability
To encode a McCapability c whose granted area is know to be included in some22If no smaller containing Area is known from context, use the full area. Area out, we define encode_mc_capability(c)
depending on whether c.inner is a CommunalCapability or an OwnedCapability.
If c.inner is a CommunalCapability, then encode_mc_capability(c)
is the concatenation of:
Bit  Bigendian bitfield 

0, 1  00 , if c.inner.access_mode == read ,01 otherwise. 
2 – 7  
encode_namespace_pk(c.inner.namespace_key)  
encode_user_pk(c.inner.user_key)  
the length of c.inner.delegations, encoded as an unsigned, bigendian compact_width(the length of c.inner.delegations) byte integer, or the empty string, if the length of c.inner.delegations is less than or equal to 59  
for each triplet (area, pk, sig) in c.inner.delegations the concatenation of:

If c.inner is an OwnedCapability, then encode_mc_capability(c)
is the concatenation of:
Bit  Bigendian bitfield 

0, 1  10 , if c.inner.access_mode == read ,11 otherwise. 
2 – 7  
encode_namespace_pk(c.inner.namespace_key)  
encode_user_pk(c.inner.user_key)  
encode_namespace_sig(c.inner.initial_authorisation)  
the length of c.inner.delegations, encoded as an unsigned, bigendian compact_width(the length of c.inner.delegations) byte integer, or the empty string, if the length of c.inner.delegations is less than or equal to 59  
for each triplet (area, pk, sig) in c.inner.delegations the concatenation of:
