# Earthstar Specification

Status: Proposal (as of 03.06.2024)

This document specifies version 6 of the Earthstar protocol. The protocol behind Earthstar is an instantiation of Willow: the Earthstar data model is a particular instantiation of the Willow data model, using an instantiation of Meadowcap for access control, and synchronising data with the WGPS.

We assume familiarity with Willow and specify Earthstar by giving instantiations of all of Willow's protocol parameters.

## Preliminaries

Some up-front definitions before we can dive into the protocol parameters.

### Cinn25519

Earthstar uses non-standard signature schemes that augment each public key with a human-readable string to aid in identifying keys. These schemes are parameterised by a minimum and maximum length of the string.

The cinn25519 signature scheme of a given min_length and max_length (both are natural numbers, with `min_length <= max_length`

) — also written as cinn25519<min_length, max_length> — is a signature scheme based off ed25519.

A secret key of cinn25519<min_length, max_length> is a pair of an ed25519 secret key, called the underlying secret key, and a sequence of at least min_length and at most max_length characters from the ASCII ranges `0x30`

to `0x39`

inclusive (`0123456789`

) and `0x61`

to `0x7a`

inclusive (`abcdefghijklmnopqrstuvwxyz`

) that does not start with a numeric character, called the shortname. A corresponding public key is a pair of a corresponding ed25519 public key (called the underlying public key) and the same shortname.

The type of signatures is the same as for ed25519, but the signing algorithm differs: to sign a bytestring b, compute the ed25519 signature over the concatenation of the following strings:

- The shortname, encoded as ascii,
- If
`min_length < max_length`

the byte`0x00`

, otherwise the empty string, and - b.

Accordingly, you verify a signature by calling the ed25519 verification function for the same concatenation.

We define encode_cinn_pk as the function that maps a cinn25519<min_length, max_length> public key to the concatenation of

- The shortname, encoded as ascii,
- If
`min_length <= max_length`

the byte`0x00`

, otherwise the empty string, and - the underlying public key (which is a sequence of bytes).

### Identifiers

Two concepts in Earthstar use cinn25519 as identifiers.

An identity identifier is a cinn25519<4, 4> public key.

A namespace identifier is a cinn25519<1, 15> public key.

#### Tag encodings

Identifiers have (optional) tag encodings, which encode identifiers as more legible strings, e.g. `@suzy.b3kxcquuxuckzqcovqhtk32ncj6aiixk46zg6pkfocdkhpst4selq`

.

To encode an identifier to a tag,

- let sigil be
`@`

if the identifier is a identity identifier, or,`+`

if the identifier is a namespace identifier for a communal namespace, or,`-`

if the identifier is a namespace identifier for an owned namespace

- let shortname be the the identifier's shortname,
- let b32_pub_key be the identifier's underlying public key, encoded as a RFC4648 Base32 string
*without padding characters and prepended by the character*.`b`

- And interpolate them into a single string of the format
`{sigil}{shortname}.{b32_pub_key}`

## Data Model

The Earthstar data model is that of Willow, with the following choices of protocol parameters.

The type NamespaceId is the type of namespace identifiers.

The type SubspaceId is the type of identity identifiers.

The max_component_length is 64, the max_component_count is 16, and the max_path_length is 1024.

The type PayloadDigest is the type of unsigned 256-bit integers, the total order we use is the numeric one.

The hash_payload function is BLAKE3, with a digest length of 256 bits.

The type AuthorisationToken and the is_authorised_write function are determined by Meadowcap, whose instantiation we describe in the next section.

## Meadowcap

The namespace_signature_scheme is cinn25519<1, 15>.

The encode_namespace_pk function is encode_cinn_pk.

The encode_namespace_sig function maps a namespace_signature_scheme signature (which is just an ed25519 signature, which is just a sequence of bytes) to itself.

The user_signature_scheme is cinn25519<4, 4>.

The encode_user_pk function is encode_cinn_pk.

The encode_user_sig function maps a user_signature_scheme signature (which is just an ed25519 signature, which is just a sequence of bytes) to itself.

The choices for the Meadowcap max_component_length, max_component_count, and max_path_length are the same as those for the data model max_component_length, max_component_count, and max_path_length.

These choices of parameters make the Meadowcap instantiation compatible with the data model instantiation.

## WGPS

Protocol parameters of the WGPS.

### Access Control

The type ReadCapability is the type of McCapabilities — as instantiated above — with an access mode of read. Consequently, the type Receiver is identity identifier, and the type SyncSignature is that of cinn25519<4, 4> signatures.

The challenge_length is 16 (yielding 128 bit for each access challenge).

The challenge_hash_length is 32 (yielding 256 bit digests), the challenge_hash function is BLAKE3, with a digest length of 256 bits.

### Private Area Intersection

The type PsiGroup is the type of curve points of Edwards25519, i.e., the twisted Edwards curve used by ed25519.

The type PsiScalar is the type of Ed25519 scalars.

The psi_scalar_multiplication function is scalar multiplication in Edwards25519.

The hash_into_group function encodes a fragment using the encode_fragment function that we define below, then uses the encoding as input to edwards25519_XMD:SHA-512_ELL2_RO_ with the ascii encoding of the string `earthstar6i`

as the domain separation tag.

We define the encode_fragment function as follows:

- Encode a fragment
`(namespace, pre)`

as the concatenation of- encode_cinn_pk(namespace), and
- encode_path(pre).

- Encode a fragment
`(namespace, subspace_id, pre)`

as the concatenation of- encode_cinn_pk(namespace),
- encode_user_pk(subspace_id) as defined in the parameterisation of Meadowcap, and
- encode_path(pre).

The type SubspaceCapability is the type of McSubspaceCapabilities for our instantiation of Meadowcap. So in particular, the type SubspaceReceiver is that of identity identifiers, and the type SyncSubspaceSignature is that of user_signature_scheme signatures.

### 3d Range-Based Set Reconciliation

The type Fingerprint is the type of 32 byte arrays that are valid encodings of Edwards25519 curve points, the type PreFingerprint is the type of Edwards25519 curve points with “cleared cofactor” (i.e., the codomain of edwards25519_XMD:SHA-512_ELL2_RO_), and the fingerprint_finalise function encodes a curve point according to RFC 8032.

The fingerprint_singleton function encodes a LengthyEntry using the encode_lengthy_entry function that we define below, then uses the encoding as input to edwards25519_XMD:SHA-512_ELL2_RO_ with the ascii encoding of the string `earthstar6u`

as the domain separation tag.

We define the encode_lengthy_entry function as mapping a LengthyEntry le to the concatenation of:

- le.available, encoded as an unsigned, big-endian
`compact_width(le.available)`

-byte integer - encode_entry(le.entry).

The fingerprint_combine function is addition of curve points of the Edwards25519 curve.

The fingerprint_neutral value is the neutral element of the Edwards25519 curve.

### Other Parameters

The decomposition of AuthorisationTokens into StaticToken and DynamicToken is as recommended for Meadowcap in the WGPS: StaticToken is the type McCapability, and DynamicToken is the type of user_signature_scheme signatures.

The transform_payload algorithm deterministically maps each Payload to its Bao Combined Encoding, excluding the first eight bytes of the combined encoding (which would encode the length).

The default_namespace_id is the namespace identifier whose shortname is `a`

and whose underlying public key consists of zero-bytes only.

The default_subspace_id is the identity identifier whose shortname is `a000`

and whose underlying public key consists of zero-bytes only.

The default_payload_digest is the sequence of 256 zero bits.

### Encoding Parameters

Whenever any encoding function needs to encode a cinn25519public key, use encode_cinn_pk. Whenever any encoding function needs to encode a signature or a digest, just use the signature or the digest itself (they already are sequences of bytes).

The encode_group_member function encodes each PsiGroup member (i.e., each Edwards25519 curve point) according to RFC8032.

The encode_subspace_capability function is encode_mc_subspace_capability, except you omit encoding the namespace_key.

The encode_sync_subspace_signature function maps each SyncSubspaceSignature (i.e., each ed25519 signature, which is already a sequence of bytes) to itself.

The encode_read_capability function is encode_mc_capability, except you omit encoding the namespace_key.

The encode_sync_signature function maps each SyncSignature (i.e., each ed25519 signature, which is already a sequence of bytes) to itself.

The total order on SubspaceId (i.e., on identity identifier) orders by shortname first (lexicographically), and by underlying public key second (again lexicographically). This ordering fulfils the necessary properties, and default_subspace_id is indeed the unique least element.

The encode_static_token function is encode_mc_capability, encoding relative to the full area.

The encode_dynamic_token function maps each DynamicToken (i.e., each ed25519 signature, which is already a sequence of bytes) to itself.

The encode_fingerprint function maps each Fingerprint (which is already a sequence of bytes) to itself.

### Bao Integration

In addition to these parameters, Earthstar integrates Bao verified streaming in a way that slightly stretches the intended semantics of the WGPS. The WGPS has several messages that require peers to specify an offset in a payload as an unsigned integer. Earthstar changes the semantics of that integer: instead of a payload offset, those messages give an offset into the depth-first numbering of the Blake3 tree of the payload. This offset must then be converted into an offset of the Bao combined encoding, to determine where in the transformed payload (i.e., the combined encoding of the payload sans the first eight length-bytes) to resume transmission.

Consider the example from the Bao spec for a payload of 2049 zero bytes (two full chunks and a third chunk with just one byte):

root parent node |left parent node |first chunk|second chunk|last chunk a04fc7...c37466...|91715a...f0eef3...|000000... |000000... |00

We will add offset conversion formulae here once we get to implementing this ourselves. Right now, the Earthstar implementation is a beta version that performs no payload transformations. If you want to implement Bao support for Earthstar/Willow, whether in an implementation of your own or in the reference implementation, please reach out.A pre-order offset of `0`

corresponds to byte zero (the start of the root parent node), a pre-order offset of `1`

corresponds to byte 64 (the start of the left parent node), a pre-order offset of `2`

corresponds to byte 128 (the start of the first chunk), a pre-order offset of `3`

corresponds to byte 1152 (the start of the second chunk), and a pre-order offset of `4`

corresponds to byte 3176 (the start of the last chunk). It is impossible to specify positions *inside* a parent node or chunk.

The following fields of messages use pre-order-offset semantics instead of payload-byte-offset semantics in earthstar:

## Friendly paths

While Willow's Paths are defined as sequences of bytestrings, Earthstar defines a subset of these as human-readable friendly paths.

A path is considered friendly if every byte of its bytestrings belong to the set of ascii encodings of the following characters: `-.0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_`

, that is, alphanumerics and `-`

, `.`

, and `_`

.

This makes it possible to provide legible encodings of paths, e.g. blogrecipeschocolate_pizza, and to input paths using a keyboard.