Comparison to Other Protocols

IPFS, Veilid, Content-Addressed Storage
Scuttlebutt, Hypercore, Append-Only Logs
Nostr
ActivityPub and the Fediverse
Earthstar
General Conclusions

An illustration of a red apple, a green apple, and an orange. How does Willow compare conceptually and technically to other distributed systems and protocols? Here, we give an overview regarding some popular11As of early 2025. projects.

IPFS, Veilid, Content-Addressed Storage

IPFS, Veilid, and similar systems are fundamentally about providing and retrieving content-addressed data. You give the system a hash, and it answers with the original data. Ingesting data into the system requires no information but the data itself. Willow, on the other hand, employs explicit naming of data. Rather than ingesting data into a system, you bind data to a freely chosen name. Retrieving data consists of supplying a name and receiving the value that was most recently bound to that name.

Pure content-addressing faces a bootstrapping problem: in order to actively distribute data to others, I must first inform them about the hash of that data. Doing so via the system itself requires me to send the hash of the hash first, and so on. For every new piece of data, I must eventually leave the system and transmit a new hash out-of-band. Willow functions differently: after agreeing on an initial name, I can continuously bind new data to the name, which automatically reaches those interested.

Furthermore, Willow's names are meaningful and hierarchical. Hashing individual moves of an unfolding chess match yields completely unrelated hashes. In Willow, I can assign related names (Paths) to the moves: chessme_vs_alexmove1, chessme_vs_alexmove2, et cetera. The synchronisation API makes use of the hierarchy; I can requests all moves in this game (names starting with chessme_vs_alex), or all chess games in total (names starting with chess).

This example showcases another difference: names are chosen by someone in a specific context. My names need not make sense to Alex in KentuckySorry, Alex!. Willow does not aim for a globally meaningful data pool, but for organising data in a way that is meaningful to certain groups of people. Hashes, in contrast, are universal. With IPFS you query “the system” for data, with Willow you query peers with whom you share some context.

Content-addressed data excels at immutability, statelessness, and global connectivity. Willow embraces mutability and statefulness, locality, and structure. Just like functional and imperative programming, there is a place for both. As the landscape of modern peer-to-peer applications matures, we expect content-addressed systems and Willow to complement each other. Lauding either as universally superior misses the point.

Scuttlebutt, Hypercore, Append-Only Logs

Secure Scuttlebutt, Hypercore, and similar systems arrange data into linked hash chains, and identify these lists (“logs”) via the public keys of a digital signature schemeIn principle, you can also design a log-based system to use capabilities instead of public keys. As of writing, there exists no popular such system, however.. Subscribing to a log amounts to requesting data without knowing the hash of that data, possibly even data that does not yet exist at the time of subscription. This elegantly tackles the bootstrapping problem faced by purely content-addressed systems.

Data replication is more efficient and significantly more simple than for Willow: for each log, peers simply exchange the number of entries they have in that log, and whoever sent a greater number then transmits all entries the other peer is missing.

A hash chain of data cryptographically authenticates that authors never produce22Or, more precisely, it ensures that such misbehavior can be detected efficiently and arbitrarily long after the fact. differing extensions of the same log. In itself, this can be a laudable feature, but it precludes the option of concurrent writes from multiple devices. With 3d range-based set reconciliation, Willow can achieve efficient data synchronisation without the need to restrict updates to a linear sequence. We do not need to enforce linearity to keep sync complexity manageable, so we opt for effortless multi-writer support instead.

The need to verify the hash chain also makes it difficult to edit or delete (meta-) data. This can be a boon for some use-cases, but also quite dangerous when employed carelessly. Furthermore, data storage may be cheap33As of writing, that is., but designing systems for unbounded growth tends to backfire sooner or later. Willow deliberately avoids any cryptographic proofs of completeness, allowing for traceless data removal when the need arises.

Nostr

Nostr appears to be quite similar to Willow at a cursory glance. Willow peers store and exchange sets of Entries, Nostr peers store and exchange sets of events. Every Entry has a timestamp, every event has a created_at time. Every Entry belongs to a single subspace, every event belongs to a single pubkey. Willow organises Entries by their paths, Nostr organises events by their tags.

While subspaces are more of a straightforward generalisation of public keys rather than a conceptual difference, the difference between paths and tags is significant. Paths allow Entries to overwrite each other, whereas regular tags do not. This difference is a symptom of a significant conceptual difference between the protocols: Willow is based on naming data, Nostr is not. In Nostr, events are content-addressed — despite the superficial similarities, Nostr is ultimately closer to Scuttlebutt than to Willow.

Willow has more generalist ambitions than Nostr, we focus on providing a general-purpose syncing primitive with precisely specified semantics and properties such as eventual consistency. Nostr takes a much more laissez-faire approach, making it significantly easier to implement, but arguably also more difficult to solidly build upon.

Furthermore, the Willow design takes great care to ensure that all necessary operations on Willow data are efficiently supported by appropriate data structures, whereas Nostr appears more happy to quickly get things working on smaller scales. To give an example, various Path handling tasks can be tackled via radix trees, ", whereas the exponential state space of arbitrary combinations of tags is much harder to tame. Similarly, Nostr was not designed with an efficient set reconciliation protocol in mindAlthough there is work to integrate a proper set reconciliation protocol., despite ultimately being a protocol for exchanging sets of events.

ActivityPub and the Fediverse

ActivityPub, like Willow, provides proper mutability by referring to data via user-selected names. Unlike Willow, however, these names work by addressing a particular server that supplies the data. This introduces a certain brittleness into the system, and forces server operators to take on responsibilities that can be hard to properly assess in advance. The projects are similar in spirit, but Willow deliberately aims to eliminate the bottleneck of server operators.

The owned namespaces of Meadowcap were designed to mimic the curative function of a fediverse instance host without tying it to computational resources (and capability delegation makes it simple to delegate and spread the cognitive load). Beyond this, communal namespaces provide a model that cannot realistically be replicated in the fediverse (unless everybody ran their own server).

Earthstar

If Earthstar feels very similar to Willow, then that is no coincidence. Willow started out as a reimagining of Earthstar, future Earthstar versions will build upon Willow, and the core maintainer of Earthstar is one of the two Willow authors.

General Conclusions

There exist many other projects in this design space, we cannot list them all. These are the ones that have informed the design of Willow or that came up as points of reference the most often.

Among all these projects, we consider Willow to have the strongest story for mutability, as Willow is the only one to completely eschew content-addressing on the protocol level. Further improving the deletion story is prefix pruning which is a feature we have not seen in any other protocol we are aware of.

Another group of features that we consistently miss in other protocols are those around capabilities. Willow’s use of write capabilities rather than public keys allows for flexible delegation. Few if any protocols devote the attention to controlling data propagation through read capabilities, and even fewer extend this care to metadata privacy by incorporating private intersection techniques.

The story for end-to-end encryption is considerably more difficult for Willow than for content-addressed systems where you can simply use hashes of data that happens to be encrypted. Paths allow for comparatively rich semantics for data organisation and partial syncing, even when end-to-end encrypted. This combination of strucure and end-to-end encryption is one we did not find in preexisting protocols.

Ultimately, Willow takes on more complexity than many other protocols, but it also gives greater expressivity to its users. Organising Entries by SubspaceIds, user-defined Paths, and Timestamps is a choice motivated by maximising expressivity while still allowing for efficient (partial) sync, end-to-end encryption, and principled implementation techniques. All in all, we think it is pretty neat!