Willow URIs
Status: Proposal
This specifications describes the willow:// URI scheme for identifying Willow resources in a standardised, human-readable form. More specifically, each willow:// URI identifies either an Entry (optionally together with a single contiguous slice of its Payload), or an AreaOfInterest in some namespace.
At a glance:
- Entries:
willow://namespace.subspace/path/components?select_features#fragment - Entry example:
willow://family.alfie/blog/idea.txt?from=5&digest=b287af#about - Areas:
willow://namespace.subspace/path/components?area&select_features#fragment - Area example:
willow://family.alfie/blog?area&count=12
Preliminaries
We say a byte is a sub-delimiterThis definition agrees with the sub-delim production of RFC 3986. if its numeric value is the ASCII code of one of the following characters: !$&'()*+,;=.
We say a byte is unreservedThis definition agrees with the unreserved production of RFC 3986. if its numeric value is the ASCII code of one of the following characters: -.0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_~ (i.e., alphanumerics and -, ., _, and ~).
We say a byte is uri_reserved if it is not unreserved.
We say a sequence of three bytes is a percent encodingThis definition agrees with the pct-encoded production of RFC 3986. if its first byte is 37 (ASCII %), followed by two ASCII codes of hex digits (any of 0123456789abcdefABCDEF). The two hex digits then encode the value of some arbitrary byte. For consistency, you should use uppercase hex digitsThis is what RFC 3986 suggests..
We denote by percent_encode the function which maps a bytestring to the bytestring obtained by replacing with their percent encodings all uri_reserved bytes.
We say a sequence of bytes is host-safeThis definition agrees with the reg-name production of RFC 3986. if it consists solely of unreserved bytes, sub-delimiter bytes, or percent encodings.
We say a sequence of bytes is query-safeThis definition agrees with the query production of RFC 3986. if it consists solely of unreserved bytes, sub-delimiter bytes, percent encodings, or the ASCII code of any of the following characters: /?:@.
We say a sequence of bytes is fragment-safeThis definition agrees with the fragment production of RFC 3986. if it is query-safe.
We denote by fragment_encode the function which maps a bytestring to the bytestring obtained by replacing with their percent encodings all bytes which are neither unreserved bytes, nor sub-delimiter bytes, nor the ASCII code of any of the characters /?:@, nor part of a percent encodings.
Willow URIs extend regular Willow Paths with the dot-segmentsDot-segments are fairly useless (they do not add any expressivity), but RFC 3986 requires supporting them. of RFC 3986:
- the two dot-segments are
.and.., - a URIPathComponent is either a regular Components or a dot-segment, and
- a URIPath is a sequence of URIPathComponents which, using the procedure defined below, converts into a valid Path (i.e., respecting max_component_count and max_path_length).
This procedure agrees with that of RFC 3968.To convert a URIPath into the Willow Path it identifies, repeat the following action for the first (“leftmost”) dot-segment until no dot-segments remain:
- If the dot-segment is
., remove it. - If the dot-segment is
..and the very first URIPathComponent of the (remaining) URIPath, remove it. - If the dot-segment is
..but not the very first URIPathComponent of the (remaining) URIPath, remove both it and the preceding Component.
Parameters
See Willow’25 for a default recommendation of parameters.In order to work with Willow URIs, one must first specify a full suite of instantiations of the parameters of the core Willow data model. Additionally, one needs:
- An encoding relation URIEncodeNamespaceId for NamespaceId, such that all codes are host-safe.
- An encoding relation URIEncodeSubspaceId for SubspaceId, such that all codes are host-safe.
- An encoding relation URIEncodePayloadDigest for PayloadDigest, such that all codes are query-safe.
URI Semantics
Note that URIs always provide an absolute reference. Relative identification (think relative hyperlinks in HTML) are not URIs but URI references. We inherit their semantics for free.We first specify the information contained by each Willow URI, independent from URI syntax.
Each Willow URI identifies either an Entry (plus, optionally, a single contiguous slice of its Payload), or an AreaOfInterest in some namespace. We call such Willow URIs Entry URIs and Area URIs respectively.
Each Willow URI contains a NamespaceId. This identifies the namespace_id of an Entry, or the namespace in which to locate an AreaOfInterest.
Each Willow URI contains a SubspaceId. This identifies the subspace_id of an Entry, or the subspace_id of an AreaOfInterest.
Each Willow URI contains a URIPath. This identifies the path of an Entry, or the path of an AreaOfInterest.
Each Willow URI contains a (possibly empty) sequence of URIs11Typically not Willow URIs., which serve as hints for how and/or where to obtain the identified data. Hints appearing early in the sequence should be tried before hints appearing later in the sequence. The sequence may22Duplicates are fairly pointless though. Almost as pointless as including the original Willow URI in the sequence. contain duplicates.
Each Willow URI may optionally contain a bytestring of application-specific data33The content of this fragment is opaque to this specification; the intended scope and functionality are detailed in Section 3.5 of RFC 3986..
Entry URI Semantics
Each Entry URI optionally contains an expected PayloadDigest. Such an Entry URI can only identify Entries with that exact payload_digest. In other words, this feature can be used to identify an Entry with a specific Payload, filtering out all differently-payloaded Entries which have been or will44Conformant stores will delete the overwritten Entry, and using an Entry URI will not “pin” it in any way. This feature will make it so that retrieval deliberately fails if the intended Entry got pruned, it does not allow keeping pruned data around. have been written in the same namespace with the same subspace_id and path.
Each Entry URI optionally contains a starting index (a U64) and optionally contains an end index (also a U64), to identify a specific subslice of the Payload of the Entry in addition to the Entry itself. The semantics are as follows:
- If neither index is present, the Entry URI identifies only an Entry but not its Payload.
- If both indices are present, the Entry URI identifies not only an Entry, but also the zero-indexed subslice of its Payload, starting at the start index (inclusive) and up until the end index (exclusive). If the start index is strictly greater than the end index, treat the Entry URI as if its end index was set to the start index instead. Indexes strictly greater than the Payload length are truncatedGiving an end index which is too large identifies the slice extending to the very end of the Payload; and giving a start index which is too large identifies the empty slice, positioned at the very end of the Payload. down to the exact Payload length for this purpose.
- If only one index is present, the other defaults to zero (for the start index) or the length of the Payload (for the end index), and then the preceding case applies.
Summarising as a data type:
The data of an Entry URI. The namespace_id of the identified Entry. The subspace_id of the identified Entry. The resolution hints for the Entry URI. The optional, application-specific extra data associated with this Entry URI. The optional expected PayloadDigest. The optional start index of the Payload slice. The optional end index of the Payload slice.}Area URI Semantics
Each Area URI contains a U64 to indicate the max_count of the identified AreaOfInterest.
Each Area URI contains a U64 to indicate the max_size of the identified AreaOfInterest.
Each Area URI contains a Timestamp to indicate the start of the times of the area of the identified AreaOfInterest.
Each Area URI optionally contains a Timestamp to indicate the end of the times of the area of the identified AreaOfInterest. If this Timestamp is not present, then the end is open.
Summarising as a data type (the first five fields are identical to those of EntryURI):
The data of an Area URI. The NamespaceId in which to identify the AreaOfInterest. The resolution hints for the Area URI. The optional, application-specific extra data associated with this Area URI. The max_count of the identified AreaOfInterest. The max_size of the identified AreaOfInterest. }URI Syntax
We now define the syntax of a human-readable, RFC 3986-compatible encoding for Willow URIs, and give some example codes.
Entry URI Syntax
We define an encoding relation EncodeEntryURI for EntryURI. Let val be any EntryURI.
Let encoded_hints be the bytestring obtained by applying percent_encode to every URI in val.hints and joining the results with 59 (ASCII ;) bytes (with neither leading nor trailing ;).
Let query_component be a bytestring obtained by concatenating the following bytestrings in arbitrary order, joining non-empty ones with 38 (ASCII &) bytes (with neither leading nor trailing &):
| |
| |
| |
|
Then the codes in EncodeEntryURI for val are the bytestrings that are concatenations of the following form:
The bytes [119, 105, 108, 108, 111, 119, 58, 47, 47] (ASCII willow://). | |||||
| Any code in URIEncodeNamespaceId for val.namespace_id. | |||||
The byte 46 (ASCII .). | |||||
| Any code in URIEncodeSubspaceId for val.subspace_id. | |||||
For every URIPathComponent comp of val.path , a concatenation of the following form:
| |||||
| |||||
| The raw bytes of query_component. | |||||
Entry URI Examples
Here are some examples of valid Entry URIs:
willow://family.alfie/blog/ideaswillow://family.alfie/blog/ideas/(the Path ends with an empty Component)willow://family.alfie/blog///ideas(the Path has two empty Components in the middle)willow://family.alfie(empty Path)willow://family.alfie/(Path consists of a single empty Component)willow://family.alfie/blog/./ideas/..(Path is equivalent to blog)willow://family.alfie/chess/../../../blog(Path is equivalent to blog)willow://family.alfie/blog?hints=wgps%3A%2F%2Fworm-blossom.org%3A1234%2Fexample;wtp%3A%2F%2Fworm-blossom.org%3A1235%2Fexamplewillow://family.alfie/blog?digest=b287afb0willow://family.alfie/blog?digest=b287afb0#blabla%0Awillow://family.alfie/blog?from=0 (identifying an Entry and its complete Payload)willow://family.alfie/blog?to=17 (identifying an Entry and the first 17 bytes (bytes zero to sixteen inclusive) of its Payload)willow://family.alfie/blog?to=6&from=4 (identifying an Entry and its Payload bytes four and five)willow://family.alfie/blog?from=5&to=5 (identifying an Entry and an empty subslice of its Payload)willow://family.alfie/blog?from=99&to=12 (identifying an Entry and an empty subslice of its Payload)
Area URI Syntax
We define an encoding relation EncodeAreaURI for AreaURI. Let val be any AreaURI.
This encoding works mostly the same way as that for EntryURIs. The only differences are the construction of query_component, and the query fragment always starting with ?areaLet encoded_hints be the bytestring obtained by applying percent_encode to the URIs in val.hints and joining the results with 59 (ASCII ;) bytes (with neither leading nor trailing ;).
Let query_component be a bytestring obtained by concatenating the following bytestrings in arbitrary order, joining non-empty ones with 38 (ASCII &) bytes (with neither leading nor trailing &):
| |
| |
|
Then the codes in EncodeAreaURI for val are the bytestrings that are concatenations of the following form:
The bytes [119, 105, 108, 108, 111, 119, 58, 47, 47] (ASCII willow://). | |||||
| Any code in URIEncodeNamespaceId for val.namespace_id. | |||||
The byte 46 (ASCII .). | |||||
| Any code in URIEncodeSubspaceId for val.subspace_id. | |||||
For every URIPathComponent comp of val.path , a concatenation of the following form:
| |||||
The bytes [38, 97, 114, 101, 97] (ASCII &area). | |||||
| |||||
| The raw bytes of query_component. | |||||
Area URI Examples
Here are some examples of valid Area URIs:
willow://family.alfie/blog/ideas?areawillow://family.alfie/blog/ideas/?area(the Path ends with an empty Component)willow://family.alfie/blog///ideas?area(the Path has two empty Components in the middle)willow://family.alfie?area(empty Path)willow://family.alfie/?area(Path consists of a single empty Component)willow://family.alfie/blog/./ideas/..?area(Path is equivalent to blog)willow://family.alfie/chess/../../../blog?area(Path is equivalent to blog)willow://family.alfie/blog?area&hints=wgps%3A%2F%2Fworm-blossom.org%3A1234%2Fexample;wtp%3A%2F%2Fworm-blossom.org%3A1235%2Fexamplewillow://family.alfie/blog?area&count=5&size=0willow://family.alfie/blog?area&count=5&size=0#blabla%0Awillow://family.alfie/blog?area&from=0 (redundantly specifying the start of the TimeRange as zero; the end is open)willow://family.alfie/blog?area&to=17 (specifying the end of the TimeRange as 17)willow://family.alfie/blog?area&to=6&from=4 (specifying the TimeRange from four to six)
URI References
Willow URIs identify resources absolutely. Many applications benefit from relative addressing, in the vein of “starting from this Entry, apply the URIPath ../image.png to obtain a different Entry”. RFC 3986 provides this feature for arbitrary URIs — so, in particular, also for Willow URIs — using its concept of URI References.
Implementations of this specification should provide support both for pure Willow URIs and for URI References using Willow URIs.