A vision of a distributed package repository in WordPress

Many if not most WordPress users are aware now of the challenge that having a single-point-of-failure in the package ecosystem provides. Even though WordPress users are (currently) able to upload plugins directly through the user interface, distributing a plugin outside the repository that .org offers is incredibly challenging.

AspirePress exists entirely to solve this problem.

Our focus is on building a sustainable, distributed, federated model of managing and distributing packages for WordPress.

The advantage of distributed over centralized

The internet is known for having core systems be distributed. In fact, the internet itself is a distributed system: any person can connect to the internet from anywhere and interact with essentially any other machine online today – if they know where to find it. How do they find those machines? That’s what DNS is for – another distributed system – which provides an easy translation layer for example.com to go to 93.184.215.14.

Distributed systems aren’t without their drawbacks: for example, eventual rather than instant consistency (think DNS). But they do offer one thing that we currently do not have in the WordPress ecosystem: an inability to be controlled by any single party.

While the internet surely has authorities for determining good actors from bad, it essentially allows all peers equal access to all other peers. The institution of authorities on the internet to determine good actors from bad was an add-on, not a design feature: the concept of spam filtering and mitigation of DDoS attacks come as later components, not originally built-in ones.

Distributed systems are designed to prevent any one person or system taking them over. If I drop the DNS records that I manage from the internet, the only thing that happens is that my (and anyone else’s records that I control) are lost from circulation. Anyone pointing those domains at new nameservers and replacing the missing records would effectively recreate the access that was lost.

Distributed repositories can work similarly. In a truly distributed, federated model there is no “single point of failure” or “authority”. Instead, each peer is responsible for determining the validity, acceptability, and authority of each peering node, and assigning levels of trust to the information they serve. Some peers may choose not to federate with other peers. Some peers may accept packages from peers while rejecting others.

How the ACF/SCF fiasco could have been avoided

Until now, the only source of truth for automatic updates to WordPress is through the WordPress.org API, which is hard-coded into every installation of WordPress out there. It’s not even configurable in the wp-admin.php file. And with plugins leaving the WordPress repository, and being blocked/closed by WordPress.org, we now have many sources of truth, and a fractured community ecosystem.

In that model, it was simple for the ACF/SCF fiasco to unfold: replacing ACF with SCF on the same slug meant that anyone asking for updates got SCF without any kind of checksum or signature being checked. It was implicitly trusted, and accepted by WordPress.

Contrast this with a distributed system where one system goes rogue and replaces a plugin with another plugin, perhaps containing malware (SCF did not contain malware). Upon discovery of the malfeasance, the distributed network could choose to defederate with the offending repository, as well as refuse to serve that content. For a short time this would create a system where some users are at risk and others are protected (a drawback of eventual consistency). But it would eventually resolve in favor of the authentic, genuine software being distributed to everybody.

In short, the kind of supply chain attack executed recently by WordPress.org would be difficult to execute. And there are ways of making it even more difficult to execute, as well.

Code signing and authenticity checking

One way of ensuring that users get what they’re asking for is to check – against a known good source – for authenticity.

For example, a plugin author would create a package of their asset, and then push it to the package repository of their choice with a signature that signs a hash of the file with a private key. The public key, known to the repository, can verify the authenticity of the signature.

Upon reciept, the repository would compare what it received with the hash, and if they match, validate the signature to ensure the hash is valid. It would then apply its own private key signature to authenticate that it verified the information, and distribute the entire chain to all the other mirrors for distribution.

With this system we have some advantages. First, we know that the plugin author authored the commit. Since a private key is kept secret by the plugin author, even if someone hard forked the repository they can’t authenticate the releases they produce (ACF couldn’t have been forked under this model).

Next, because the repository attests to the authenticity of the plugin, and the repository is trusted by other peers, that information can be trusted as authentic without rerunning the checks. If a repository is particularly paranoid, it can rerun the checks and even issue its own certificate of authenticity.

This isn’t new technology either: JWTs work similarly. The first two portions of a JWT are signed by either a symetric or asymetric key. We’re proposing an asymetric key to ensure no one party holds “all the keys.”

The best part is that this process can be entirely automated. For example, a private key stored as a secret in GitHub can be used to sign the package, and then GitHub publishes the public keys of users for authentication purposes. The repository can simply check a list of trusted public keys to verify that the right key was used for signing. If the private key is compromised, that key can be dropped from the public keys, and the mirror will no longer consider it a valid authentication source.

This process also significantly improves the current model, which offers no verification that a package was provided by the author other than the SVN account of the user being authenticated. This very system allowed the ACF/SCF crisis to occur.

Trust amongst peers

In order for a distributed model to work, there has to be trust between peers. Therefore, we am proposing a model that offers three levels of trust. Peers also always have the option of Zero Trust, meaning they do not trust a peer at all and ignore anything the peer provides.

  1. Basic Trust. This level of trust requires verification of a peer through another more trusted peer. For example, if a Basic Trust peer were to publish a new plugin, the peer implementing Basic Trust would either have to check authenticity of the package for itself, OR trust another peer to have completed the same checks, befoere trusting that the peer is authentic. This is useful for new peers that the community does not know well.
  2. Implicit Trust With implicit trust, a peer trusts that another peer does the verification required and generally is reliable as a source of truth. However, it still prefers the next level of trust over this peer. So, if another source provides information that conflicts with a peer at the Implicit Trust level, the higher level source controls and overwrites the information from the lower level peer.
  3. Source of Truth Trust When a peer is a Source of Truth, it is considered an authority for all things related to packages and assets. For example, that might be AspireCloud itself, or another trusted partner that disseminates large numbers of plugin and theme updates from trusted sources. There should be few Source of Truth Trust levels set, and the goal of a distriuted system is to have >1 to ensure that the system is not vulnerable to a single source of truth, but at this level, the trust implied is absolute.

A peer must have at least one other peer that is an Implicit Trust peer in order to recieve updates. All peers generally start at the Basic Trust level, and must manually be elevated to the Source of Truth Trust. However, when creating a mirror, it would be assumed that mirrors would have the option to assert one or two Soruce of Truth Trusts to pull their data from.

Conclusion

This is a vision, not a technical architecture for the future. It outlines the goals of a distributed, federated system that implicitly and explicitly trusts other peers, and offers a vision of a world where the ACF/SCF fiasco would be impossible. AspirePress is looking for individuals interested in working on specification for this vision, and the development of a standard for mirrors to pass information to one another, not just in the WordPress space but in any space where code repositories are distributed over HTTP(S).

Vision

Thoughts? Leave a reply

Your email address will not be published. Required fields are marked *