How the Hypercore Protocol Works
Hypercore is a distributed append-only log
Hypercore comes with a secure transport protocol, making it easy to build fast and scalable peer-to-peer applications. Think lightweight blockchain crossed with BitTorrent.
If you are not familiar with append-only logs, they are basically just lists you can only append to. If you think about it in terms of normal array operations, it is a log where you can only call get(index), push(data) and retrieve the log length, but where you can never overwrite old entries.
Hypercore allows you to quickly distribute these kind of logs in a peer-to-peer fashion. Each peer can choose to download only the section of the log they are interested in, without having to download everything from the beginning.
Using append-only logs, Hypercore can easily generate compressed bitfields describing which portions of a log a peer has. This, among other things, helps make the replication protocol light and efficient.
Secured by Merkle trees and cryptography
Hypercore verifies log contents by building a Merkle tree using the BLAKE2b-256 hash function. Peers can also use this Merkle tree to only download the parts of the log they are interested in, and redistribute these parts to others.
Merkle trees are widely used in many distributed systems. They are normally only used to verify static content, because when content changes the root of the tree changes as well. To support mutable datasets, Hypercore uses asymmetric cryptography to sign the root of the Merkle tree as data is appended to it.
Only the owner of the private key can append to the log. But one Hypercore is rarely used on its own -- more powerful, multi-user data structures can be created by combining multiple cores. Hypercore's main purpose is to be a building block in other things.
Hyperdrive is a P2P filesystem
Hyperdrive is designed to help you share files quickly and safely, directly from your computer.
Hyperdrive is built using Hypercores. Using an append-only index you can find only the files you are interested from a large drive in milliseconds -- all P2P.
By default, readers only download the portions of files they need, on demand. You can stream media from friends without jumping through hoops! Seeking is snappy and there's no buffering.
If you want to learn more about the code you can see the Node.js implementation in the GitHub repository
If you want to download or share filesystems, we suggest you read on and try out the daemon.
How does Hyperdrive work?
Hyperdrive indexes filenames into a Hypercore. To avoid having to scan through this entire Hypercore when you want to find a specific file or folder, filenames are indexed using an append-only hash trie, which we call Hypertrie. The hash trie basically functions as a fast append-only key value store with listable folders.
If you are looking for a simple database abstraction on top of Hypercore you can also use Hypertrie standalone, outside of Hyperdrive. In you want to learn more about how to use it, we suggest you look at the Hypertrie repository
Since it builds on top of the append-only log, it inherits the same guarantees of every change being versioned by default, making it easy to see historical changes and prevent accidental data loss.
In addition to the Hypertrie, which we refer to as the metadata log, Hyperdrive uses another Hypercore to store the binary file content of each file you insert. This dual-log design makes it easy to replicate or watch only the metadata log, without content, if that is what you are interested in.
Each entry in the Hypertrie links to the content log to signal where a file's binary data starts and ends. Additionally, the entry contains all the normal POSIX data you'd be interested in, such as modification time, creation time, file modes, etc.
For better composability and collaboration, an entry can also link to a completely different Hyperdrive or Hypercore. We call this feature mounts. Even though it's ostensibly simple, it can be used to build powerful collaborative features.
Internally we have been using Hyperdrive mounts for a concept we call "groupware", where each user mounts their own drive inside a single shared one, then applications render multi-user views over the group drive. The groupware pattern can be used to build lightweight, Dropbox-like applications, among others.
We are always exploring new ways to enhance Hyperdrive! If you are interested in collobarating with us always feel free to to open an issue on Github, reach out at @hypercoreproto or join our Discord.
Hyperswarm is a DHT for the home
Hyperswarm combines a Kademlia-based distributed hash table (DHT) for global discovery with MDNS to discover peers on local networks.
A Kademlia DHT
As with many P2P projects, Hyperswarm uses a DHT to discover peers on the internet. Hypercore maintains a hash of its public key, called a "discovery key", that is used as the topic with which peers share their IPs and ports to discover each other.
Hypercore comes with a capability system wherein each peer has to verify they know the correct public key before they start sharing data. Even if you know a Hypercore's discovery key, you must also know the public key in order to download data from peers.
The DHT used in Hyperswarm is based on Kademlia, a UDP-based DHT which is used by many projects, including BitTorrent. You can learn a lot more about Kademlia in the Kademlia paper.
DHTs scale well in general, but are notoriously hard to work with. Hyperswarm employs a series of heuristics to answer queries quickly and garbage collect stale data as fast as possible.
As a general note, most DHTs (Hyperswarm included) generally expose routing information such as your IP/port to help route requests. Privacy-preserving DHTs are an active research area by many participants in the P2P ecosystem.
Ideal P2P networks should be able to connect any two peers together, wherever they are. In practice, this is a challenge due to firewalls which reject incoming connections and NATs which mask your IP. To help “break through” firewalls, a technique called UDP holepunching is used.
Traditionally UDP holepunching requires the use of centralised servers that are preknown by each peer in a network.
Hyperswarm expands on holepunching by making it a first-class feature: any peer in the DHT can help you holepunch to any other peer that it knows about.
After finding the relevant peers, Hyperswarm utilises the UTP transport protocol, to make reliable connection streams between peers. For non-holepunching cases, TCP is used as well.
Importantly, Hyperswarm connections are unencrypted -- encryption must be added separately. As an example, once Hyperswarm has established a connection, Hypercore wraps it in a Noise protocol stream to add end-to-end encryption.