Architecture spec
The long-form design doc β protocols, crypto, invariants.
The long-form companion to the Configuration, CLI, and REST API references. It explains why the daemon looks the way it does β the protocols, the crypto, the trust boundaries, and the small set of invariants every later chapter takes for granted.
The shape of MeshHold is a single binary that is, at once, your local store, your gossip relay, your VPN exit, your chat server, and your S3 endpoint. All of those layers share one Go process, one libp2p host, one Badger metadata store. That is the trick. Most of the design below is a consequence of collapsing those roles into one box.
1. The shape of the system
A MeshHold network is a set of peer nodes speaking libp2p over a shared
swarm key. There is no central directory, no coordinator, no DNS-rooted
identity. Every node is symmetrical at the protocol level; the asymmetry comes
from configuration β which vault keys a node holds, whether it is marked
reliable, which mgmt keys it has provisioned.
The diagram is deliberately heterogeneous. A trusted home server, a public VPS, a phone, a cheap untrusted VPS holding only ciphertext, a Raspberry Pi camera β none of them are special at the network layer. Their roles differ only in which keys are present and which transports are enabled.
What MeshHold deliberately is not:
- Not a blockchain. No global ledger, no global consensus, no proof-of-anything.
- Not federation. Nodes do not route on behalf of "users on other servers". An identity (a peer id) lives on exactly one daemon at a time.
- Not a CDN. Replication is targeted at durability and locality, not at edge caching for unknown clients.
- Not a public network. A node with no swarm key can't open a connection. Everything below assumes you're already past that gate.
2. The node daemon
One Go process per machine, organised in four cooperating layers.
- External clients β the bundled Web UI, the Android app, the Windows
tray launcher, the
meshholdCLI, plus any third-party that wants to talk S3 or post a webhook. - API surface β REST for state-changing calls, Server-Sent Events for the live event stream, WebSockets for call media, an embedded S3-compatible endpoint, inbound webhook receivers. All on the same listener.
- Core engine β replication manager, catalog/SongAgg, chat broker, call signaling, tunnels, push gateway, agent driver, audit log, mgmt-key verifier. These talk to each other through Go channels and the shared metadata store, not through internal RPC.
- libp2p host + storage β one libp2p host with all
/meshhold/*protocol handlers registered, on top of a Badger metadata store and a blocks directory on the filesystem.
The single-process design is load-bearing. The replication manager can read a holders table in microseconds. The chat broker can hand a freshly-decrypted message to the SSE fan-out without a network hop. The tray launcher's loopback REST request and the libp2p stream from a remote peer end up in the same handler with the same authz check.
Disk layout
On Linux the base is ~/.meshhold/; on Windows it's
%LOCALAPPDATA%\MeshHold\. The interesting subdirectories:
| Path | Purpose |
|---|---|
meta/ |
Badger key-value store: vaults, holders, peers, chats, agent sessions, audit log. |
meta/identity.key |
The node's libp2p Ed25519 private key. Wrapped by the at-rest master key when enabled. |
blocks/ |
Convergent ciphertext blocks, content-addressed by SHA-256. |
vaults/<id>/ |
Trusted-vault plaintext folders (only for trusted nodes). |
audit.log |
Append-only, signed audit trail. |
config.yaml |
The single source of truth at startup. See the Configuration reference. |
Nothing outside this base is read or written, with two exceptions: the
optional system-VPN helper service on Windows, and OS keychains when
at_rest_encryption.source: keychain is configured.
3. Identity, swarm, peer discovery
Every node has a stable peer id derived from an Ed25519 keypair stored in
meta/identity.key. The peer id is what every other surface references β the
holders table, mgmt-key grants, push roster, audit log all pivot on it.
The swarm key is a 256-bit shared secret in libp2p's standard
/key/swarm/psk/1.0.0/ format. It is the outer gate to the mesh: libp2p
refuses to complete a connection if the peer can't prove possession of the
same PSK. A node booted without a swarm key joins "limited mode" β the REST
API and S3 listener come up, but the P2P stack is dormant.
Peer discovery happens four ways, in roughly this order:
- bootstrap_peers β config-supplied multiaddrs the daemon dials at start.
/meshhold/hello/1.0β on every fresh connection, peers exchange the list of other peers they've seen recently. A single working bootstrap eventually yields the whole reachable mesh.- mDNS β for LAN. Disabled on hardened nodes via
node.mdns_enabled: false. /meshhold/topology/1β a GossipSub topic where every node publishes a periodic summary (its current peer set, capabilities, mv-bytes counters). This is what powers the Network page in the Web UI.
NAT'd peers reserve Circuit Relay v2 slots on publicly-reachable peers via
the standard libp2p mechanism (node.relay.auto_dial). Public-reachable
nodes flip node.relay.serve: true (and usually nat_service: true) to
volunteer.
4. Blocks and convergent encryption
Files become blocks. Small files (β€512 KiB) fit in one block. Larger files are split into 4 MiB chunks. Every block is encrypted with a key derived from its own plaintext hash β the convergent encryption property:
The flow per block:
h = SHA-256(plaintext)key = HMAC-SHA256(h, "convergent")β 32 bytes for AES-256nonce = first 12 bytes of HMAC-SHA256(h, "nonce")β deterministicciphertext = AES-256-GCM(plaintext, key, nonce)cid = SHA-256(ciphertext)β the on-wire and on-disk identifier
Property: identical plaintext β identical ciphertext β identical CID. The mesh stores one copy of every block, no matter how many devices independently contain the same file.
The price is the standard convergent-encryption caveat: an attacker who already has a candidate plaintext can verify whether you've stored it. For the threat models MeshHold targets (your photos, your documents, your chats) that's an acceptable trade. A random-nonce mode is planned but not shipped.
5. Hash chains and versioning
Files are mutable. To track versions without a central coordinator, every
file carries a hash chain: a sequence of content_hash pointers, each
referencing the previous version. A version record is metadata, not a copy β
old plaintext is not preserved.
v1 (alice) v2 (alice) v3 (alice)
ββββββ parent ββββββ parent ββββββ
βAAA β βββββββββ βBBB β βββββββββββ βCCC β linear history
ββββββ ββββββ ββββββ
v2' (bob, offline)
ββββββ
βDDD β βββ fork: BBB has two children
ββββββ
When two devices edit the same file while disconnected, the resulting chain forks. The catalog gossip surfaces this on the next sync; the user is shown both heads and asked to resolve. Forks are visible to every node in the swarm, including untrusted holders β that is what lets an untrusted node know when its locally-held ciphertext is stale and eligible for eviction.
6. Vaults
A vault is the unit of access. Concretely, it's a tuple of:
- a stable id and human name,
- a content key (or absent, if this node is untrusted for it),
- a policy (replication factor, hard quota, retention),
- a type:
folder,chat,tunnel, oragent.
The type determines the API surface mounted on top:
| Type | API | Stored bytes |
|---|---|---|
folder |
/api/v1/vaults/{id}/{path:*} β files and folders |
encrypted block tree |
chat |
/api/v1/rooms/{id}/messages β chat history |
encrypted message log |
tunnel |
/api/v1/tunnels/{id} β capability grants |
tunnel scope + mgmt keys |
agent |
/api/v1/sessions/{sid} β AI agent state |
session transcripts + attachments |
Trust is a per-vault property of the node, not of the user. A laptop can be trusted for the family-photos vault and untrusted for the company-secrets vault; the latter case sees only ciphertext blocks it helps replicate.
7. Replication
Each vault declares a replication factor β how many reliable nodes should hold every block. Replication is gossip-driven, not coordinator-driven:
The mechanism in steps:
- When a node accepts a new block (write from a local client, or fetch from
a peer), it publishes a
block-have(cid, vault)announcement on the/meshhold/replication/<vault>pubsub topic. - Every member of the swarm keeps a holders table β
cid β set of peer ids that announced have. The table is bounded bynode.holder_ttl; entries expire if not refreshed. - A periodic replication cycle (
node.replication_min_period.._max_period) walks every block this node is interested in, asks the holders table how many reliable peers currently hold it, and either fetches a copy (if short) or volunteers to drop one (if over RF and tight on disk). - Block transfer happens over
/meshhold/block/1.0β one stream per block, length-prefixed.
reliable: true means "long-lived holder candidate". Servers and VPS default
to true; phones default to false. The replication scheduler counts only
reliable peers when computing current_count, but accepts blocks from
anyone. A non-reliable peer participates in distribution without inflating
the RF math.
When the mesh has no reliable peers at all (a user with only phones and
laptops), the math collapses gracefully: current_count never reaches RF,
so every peer with free disk just takes a copy. Data ends up on every
device β exactly what the user wants.
8. Mesh topology and gossip
A small fixed set of GossipSub topics carries everything that's not a direct RPC:
| Topic | Carrier of |
|---|---|
/meshhold/topology/1 |
Per-node summary heartbeat (peers, caps, byte counters) |
/meshhold/replication/<vault> |
block-have / block-want announcements |
/meshhold/chat/<room> |
Live chat messages for a single room |
Hello (/meshhold/hello/1.0) is not gossip β it's a direct RPC on every
fresh connection. Catalog (/meshhold/catalog/1.0) is also direct: when a
node opens a fresh connection and shares a vault, the two peers exchange
their per-vault catalog state (Merkle tip + a delta).
Bandwidth budgeting matters at scale. The /meshhold/speedtest/1.0 protocol
is a short ping-pong that fills the topology heartbeat's mv_bytes counters
shown on the Network page.
9. Protocol map
The complete set of /meshhold/* protocols the daemon registers on its
libp2p host:
A few invariants:
- Versioning is in the path.
/meshhold/tunnel/1.0is single-hop;/meshhold/tunnel/1.1is multi-hop. A peer that supports only1.0is silently treated as single-hop-only. - Authorisation is per-protocol. Anyone in the swarm can dial
/meshhold/block/1.0. Only a peer presenting a current mgmt-key proof can dial/meshhold/tunnel/1.1with the tunnel cap. The verifier lives next to the handler. - Block bandwidth is categorised. The metrics layer
(metrics/categories.go) classifies
every protocol into
Block/Tunnel/Chat/Gossipbuckets so the Network page can render coloured throughput rings.
10. Transports and obfuscation
The libp2p host can bind multiple transports simultaneously. Out of the box:
- plain TCP β the default. Noise handshake over TCP. Fast, but a censor with DPI can fingerprint the handshake.
- REALITY β TLS-REALITY transport. The listener does an unauthenticated
TLS handshake with the client and forwards it to a real upstream
(
node.obfs.reality.dest) unless the client proves it knows the REALITY X25519 key. To a passive observer this is just TLS to whatever cover domain you pointed at. - SSH masquerade β same idea but with an SSH banner and key-exchange
prefix. The listener can be probed and will hand back a real
SSH-2.0-OpenSSH_β¦banner before the secret handshake begins.
node.obfs.order controls outbound dial priority. In hostile environments
you put ["reality", "ssh", "plain"] so the daemon tries the obfuscated
transports first and only falls back to plain TCP on a clean network.
11. Tunnels and port forwarding
A tunnel is a length-prefixed bidirectional substream routed across one or more relay hops. The same primitive powers everything that needs "talk to something behind a NAT":
The control protocol is /meshhold/tunnel/1.1 (chained); the data plane is
/meshhold/tunnel-data/1.0. A tunnel-open request carries:
- the target peer id at the far end,
- the protocol id the far end should dial when the substream is mounted
(e.g.
/meshhold/block/1.0for block transfer, an arbitrary port number for port-forward), - a mgmt-key signature with the
tunnelcapability.
The chain works by having each hop recursively open a fresh tunnel to the next hop, carrying the original request. Hops only know their immediate neighbours. The end-to-end stream is wrapped in a fresh Noise session authenticated by the entry/exit peer ids β middle hops see ciphertext, even though they're already inside the swarm-keyed connection.
Port forwarding is ssh -L / ssh -R reimagined: the
Port Forward feature reuses the
same tunnel machinery, with a TunnelListenRequest that asks a remote peer
to listen on a host:port and forward connections back through the tunnel.
The @reverse/<id> sentinel in the substream protocol marks it as the
"server pushes to client" variant.
12. Chat
Chat rooms are vaults of type chat. Each room has its own content key,
shared on invite. Two protocols carry messages:
/meshhold/chat/<room>β a GossipSub topic for the live path. A freshly-posted message hits every connected member within sub-second./meshhold/chat/sync/1.0β a request/reply protocol for backfill. When a peer (re-)joins, it asks an active member for "messages sincelast_seq" and replays them.
Messages are AEAD-encrypted with the room content key before being published, so an untrusted holder of the room's ciphertext blocks (if you've chosen to replicate chat history into a vault) sees only opaque bytes.
Push notifications for offline members are produced by a per-room background
loop that watches /meshhold/chat/<room> and, after a configurable silence
(45 s by default), sends a UnifiedPush wake-up to subscribed endpoints β
see push notifications below.
13. Calls
Audio and video calls run over libp2p using two protocols:
| Protocol | Purpose |
|---|---|
/meshhold/call/1.0 |
Signaling: ring, accept, hang up, ICE-equivalent capability ads |
/meshhold/call-media/1.0 |
Media: WebCodecs-encoded frames wrapped in ChaCha20-Poly1305 |
/meshhold/call-relay/1.0 |
Forwarding when the two endpoints can't connect directly |
The browser side uses WebCodecs for encode/decode (H.264/Opus by default; falls back to VP8). The audio/video frames are not WebRTC β they are framed inside our own length-prefixed media protocol, AEAD-wrapped with a per-call key, and shipped over a normal libp2p stream. That keeps the crypto in our hands (the same Noise + content-key stack as the rest of the system) and lets a relay node forward frames without ever decrypting them.
Group calls (more than two parties) are not in scope β see non-goals.
14. Push notifications
Most phones can't hold a TCP connection open in deep sleep. MeshHold uses UnifiedPush for offline wake-ups:
- Each phone registers a
PushEndpoint(a UnifiedPush distributor URL) in its profile. - The phone picks the swarm peer with the lowest latency that has the
push.gateway: trueconfig flag and asks it to be its push relay. - The gateway daemon watches the rooms the phone is a member of. After 45
seconds of silence (
push.silence_threshold) β long enough that the phone is genuinely backgrounded β the gateway POSTs a tiny wake-up payload to the phone's UnifiedPush endpoint. - The wake-up contains only a room id and a message-seq high-watermark.
No content, no sender, no preview. The phone wakes, opens a fresh libp2p
connection, fetches the actual messages over
/meshhold/chat/sync/1.0, and renders the native notification locally.
The gateway role is config-driven (a flag on the node), not negotiated. A network without a gateway-flagged node simply has no push.
15. AI agents
An agent instance is a vault of type agent plus a driver (Claude Code,
OpenCode, β¦). The driver runs the chosen CLI as a child process, with
CLAUDE_CONFIG_DIR / equivalent pointed at a per-instance directory inside
meta/. Sessions, transcripts, attachments, and MCP approvals are all
stored back in Badger.
Remote access goes through /meshhold/plugin/agent/1.0 β an HTTP/1.1 stream
over libp2p, much like the tunnel protocol but with HTTP semantics on top.
The Web UI on device A can hit /api/v1/sessions/... on its local daemon,
which transparently routes the request via the agent plugin protocol to the
node that actually hosts the agent instance.
Crucially: agents do not get vault tools. The driver only exposes the chat surface (prompt, attachments, model selection, approvals). MCP servers are scoped per instance. The tenant boundary is the user, not the node β see the trust diagram below.
16. The S3 listener
When node.s3.enabled: true, the daemon mounts an embedded S3-compatible
HTTP listener on node.s3.listen_addr (default 127.0.0.1:3900, loopback to
prevent accidental exposure).
- Buckets = vault aliases.
s3://meshhold/photos/holiday.jpgresolves by looking up "photos" in the per-key bucket map; the rest of the path is a vault path. - Sig v4 only. Both header-auth and presigned-URL flows. The Sig v4
scope's region is
node.s3.region(defaultmeshhold). - Path-style by default. Virtual-hosted style (
<bucket>.<base_domain>) activates only whennode.s3.base_domainis set.
The single-PUT cap (node.s3.max_put_bytes, default 64 MiB) forces large
uploads to multipart, which lets the convergent-encryption layer chunk on
the same 4 MiB boundary the rest of the system uses.
17. The REST API surface
Everything every UI does is REST. See the REST API reference for the full catalogue. A few cross-cutting invariants worth highlighting here:
- Bearer auth on every state-changing route. The bearer is minted by
POST /auth/loginagainst the Web UI password, or byPOST /auth/bootstrap-exchangeagainst a one-shot ticket from the tray launcher's loopback IPC pipe. - SSE for events.
GET /events/streamstreams every state-change event the caller is authorised to see β vault updates, holder changes, call rings, agent stream events. Web UIs subscribe once. - WebSocket for two media surfaces only.
GET /calls/{id}/mediaandGET /rooms/{id}/streamβ both because they carry binary frames. - Errors are uniform.
{"error":"message"}plus a non-2xx status.507 Insufficient Storageis the only "special" status, used to surface vault hard-quota hits with extra fields.
18. Trust boundaries
There are three independent gates, and a role is the cartesian product of which gates you've passed:
A few easy mistakes to avoid:
- A bearer token grants whatever the node can see, not whatever the user owns. If Alice's laptop and Alice's home server both speak the same vault, she still needs two bearers β one for each daemon. The daemons talk to each other through libp2p tunnels, not by sharing tokens.
- An "untrusted" node is not "less trusted by reputation". It's just a node without that vault's key. The same node is fully trusted for vaults whose keys it does hold.
- The swarm key is not the vault key. Losing the swarm key locks you out of the network; losing a vault key locks you out of a vault. A network rotation does not invalidate vault content.
19. Invariants and non-goals
Things the daemon promises will always be true:
- Identical plaintext β identical block bytes on the wire and on disk. Convergent encryption is the dedup primitive.
- A peer with no swarm key cannot complete a libp2p handshake. Limited mode runs the REST/S3 listeners only.
- Replication is per-block, not per-file. Small files don't pay the cost of large ones.
- Block bandwidth is bounded by
node.blocks_max_bytesandnode.blocks_reserve_bytes. Replication evicts before either limit is crossed. - The mgmt-key verifier is the only path to tunnel/camera caps. No
flag, no special peer id, opens those gates. The
MgmtKeysPanelis the user-facing surface.
Things MeshHold deliberately does not do:
- No group calls. 1:1 calls only. Group multimedia is a separate set of problems (mixers, jitter buffers, SFU vs. MCU) that we chose not to inherit.
- No CRUD for webhooks or push gateways. Both are config-only. A web surface for either would let a compromised bearer turn the daemon into an arbitrary HTTP-egress device. Config files require shell access.
- No outbound HMAC signing on webhooks. The receiver authenticates by the URL secret in the path, full stop.
- No interactive prompts in the CLI. Every
meshholdsubcommand is scriptable. The interactive flows live in the Web UI. - No random-nonce mode for blocks (yet). Convergent dedup is the default. A future opt-in random-nonce vault flag is on the roadmap.
20. Where this leaves you
If you want to read the code, the starting points are:
- internal/p2p β every
/meshhold/*protocol implementation. - internal/replication β the holders table and replication cycle.
- internal/vaults β vault types, trust resolution, convergent encryption.
- internal/tunnel β the multi-hop tunnel machinery; everything else that needs "reach a peer through other peers" is a thin wrapper around it.
- internal/agent β the agent driver, OpenCode/Claude Code integrations, plugin protocol.
- internal/api β the REST layer and SSE fan-out.
The Configuration reference is the inventory of every knob; the CLI reference is the operator surface; the REST API reference is what every client looks like from the outside. This document is the reason they all fit together.