15 votes

WebMesh: Yet Another WireGuard Mesh/VPN Solution

10 comments

  1. [7]
    tinyzimmer
    Link
    Hi friends! Showing off my new project, Webmesh. It is similar to projects like TailScale - but with a controller-less architecture where any node can help facilitate new connections to the mesh....

    Hi friends!

    Showing off my new project, Webmesh. It is similar to projects like TailScale - but with a controller-less architecture where any node can help facilitate new connections to the mesh. More infoz can be found on the project website: https://webmeshproj.github.io/

    Happy to hear any feedback :)

    6 votes
    1. [6]
      vord
      Link Parent
      One thing that Zerotier and Tailscale do well (ZT in particular, as thats what I use) is being able to hole punch through hostile networks like double NAT and restrictive firewalls. They also...

      One thing that Zerotier and Tailscale do well (ZT in particular, as thats what I use) is being able to hole punch through hostile networks like double NAT and restrictive firewalls. They also handle rapidly changing nodes, like cellphones connecting to different networks, in part due to the central router serving as the fallback.

      Are there any things this does to help with that? Bootstrapping in particular seems difficult in a changing environment.

      4 votes
      1. [5]
        tinyzimmer
        Link Parent
        It's an area where I am unsure how to proceed currently. Under default conditions, nodes peer to the ones they are able to reach. The ones they cannot reach have their traffic handled by the nodes...

        It's an area where I am unsure how to proceed currently.

        Under default conditions, nodes peer to the ones they are able to reach. The ones they cannot reach have their traffic handled by the nodes that can reach those targets. In that way the network branches out, rather then directly connecting everyone. The admin API then exposes ways to explicitly define the hops that take place.

        I've added the ability to facilitate so called "direct peerings" (example here: https://github.com/webmeshproj/node/tree/main/examples/direct-peerings) in a way very similar to how TailScale/ZT do by using ICE connections, but this is obviously not as robust as what the others have put together thus far. They are far more mature.

        The final question I keep having with myself, is just how much more I want to put into the core code. I've added this so called "Plugin architecture" that in theory should be able to allow people to handle most use cases not intrinsically supported. But a lot of the juice in offerings like this is all the magic that takes place underneath. It just comes at the expense of more complication in the code. So where those lines are is still a bit blurry for me.

        2 votes
        1. [4]
          vord
          Link Parent
          I always point to the inaugural blog post from the creator of Zerotier: Decentralization: I want to Believe. I very much do like your idea of remaining as simple as possible. This particular...

          I always point to the inaugural blog post from the creator of Zerotier: Decentralization: I want to Believe.

          I very much do like your idea of remaining as simple as possible. This particular problem space does lend itself to rapidly increasing complexity in trying to make the experience better.

          2 votes
          1. [3]
            tinyzimmer
            Link Parent
            This is a fantastic read! Thanks! I'm kind of in the same boat as the author I think. Decentralization of this project would be a side effect of the way it is built, but not necessarily a core...

            This is a fantastic read! Thanks!

            I'm kind of in the same boat as the author I think. Decentralization of this project would be a side effect of the way it is built, but not necessarily a core feature. That's mainly because the most common use cases would ultimately involve some level of centralization to begin with.

            I think I illustrate that best in my pseudo "site-to-site" example. What you really have is just a fault-tolerant, centralized network. Call the "leader" nodes your MPLS racks basically. But like the author, I guess I want to believe lol.

            Let's say we want to make the ZeroTier peer-to-peer network completely decentralized, eliminating central fixed anchor nodes. Now we need a fast, lightweight distributed database where changes in network topology can rapidly be updated and made available to everyone at the same time. We also need a way for peers to reliably find and advertise help when they need to do things like traverse NAT or relay.

            So I hope I am providing this with the in-memory sqlite, periodic raft snapshots, and STUN/TURN capabilities. I don't think typical network needs justify an enormous schema that will outgrow the sqlite. I think it would hit system limitations before. In my test environments it has held up quite well, but it definitely needs to be battle tested.

            1. [2]
              vord
              Link Parent
              Depends on the size of the network and how rapid everything changes. And how much you'd be willing to leverage external services like DNS. Primarily intended for stationary home/office computers?...

              Depends on the size of the network and how rapid everything changes. And how much you'd be willing to leverage external services like DNS.

              Primarily intended for stationary home/office computers? You're probably OK.

              Primarily intended for cellphones or ephemeral cloud servers? Things get dicy quick.

              A model along hub-and-spoke, where a user pairs their mobile device with a stationary one (which has a DNS entry to cover sporadic IP changes), could help.

              1. tinyzimmer
                Link Parent
                This is one of the areas where I think "peering" meshes could help. One of the hard limitations of using WireGuard (and really ICE negotiation in general) is you will eventually hit a system...

                This is one of the areas where I think "peering" meshes could help. One of the hard limitations of using WireGuard (and really ICE negotiation in general) is you will eventually hit a system limit. Be it hitting a maximum number of peers, or running out of local ports for relaying traffic. There is already the concept of MeshDNS that can be optionally exposed on nodes. It's pretty much a requirement though if I explore the independent but peerable meshes route. Each mesh has a domain where a node is resolvable by {node_id}.{mesh_domain}. But it is a very juvenile implementation currently. No concept of forwadings or any of that.

                There isn't that much information going into the schema. Address/key mappings and RBAC rules pretty much. In a single mesh of 65k peers (which is currently a hard limit because of how I'm handling IPv6, but I want to fix that) it gets to be a couple MB. The logic in allowing nodes to join makes sure that all currently connected nodes are caught up with the latest before another mutation occurs. Whether that scales is a question for experimentation I suppose.

                2 votes
  2. [3]
    Earthboom
    Link
    So I've done some reading on this subject because your github nor this post make it immediately clear what this project does. By going on tailscale, however, I have a better understanding. Correct...

    So I've done some reading on this subject because your github nor this post make it immediately clear what this project does. By going on tailscale, however, I have a better understanding. Correct me if I'm wrong, but here's my understanding of what you're doing with Webmesh.

    It's essentially scripting networking actions across operating systems in order to route traffic, set up subnets, DNS, and firewall rules and maintaining a database of keys and connections from controller to controller, correct?

    So you leverage wireguard which means a wireguard Interface is created, configured and pointed towards the right WAN and port, and firewall rules are created and modified. On the listening end, the same thing is happening with a mini software router that sends packets to the local router and translates packets from the local router through the wireguard tunnel.

    The database is kept on whatever serves as the controller and then mirrored on every client so any client can then serve as a controller while all clients reconfigure themselves to point to the new host more or less?

    I do this manually now with a pfsense router acting as a wireguard host via plugin. My clients all connect home and exit via the router as all traffic is tunneled. Everything can see everything no matter where I am.

    The benefit to this project is there doesn't need to be a home network and the host and client can be on whatever network with whatever router, correct?

    I noticed tailscale bypasses local DNS filtering so long as the wireguard port isn't blocked, does webmesh do this? How is it done? If I'm wall gardened I can't bypass it VPN or not. Or are you referring to setting the local DNS to the host address so it's bypassing the local DNS filtering in the router?

    2 votes
    1. [2]
      tinyzimmer
      Link Parent
      Yea my documentation and pitch still leaves much to be desired. Work in progress on that front. You are pretty much right on the money though with most of your assumptions, save towards the end....

      Yea my documentation and pitch still leaves much to be desired. Work in progress on that front.

      You are pretty much right on the money though with most of your assumptions, save towards the end. It is essentially scripting networking actions across devices, while maintaining a very small in-memory sqlite (with periodic snapshots to disk) of public keys, routes, edges (I maintain nodes as a DAG and each edge represents a "direct peering"), etc.

      WireGuard is created, configured, and pointed towards the people the network says that person can reach. They'll be able to reach everyone else as well by default - but only through the node that they joined. This is probably best described in my site-to-site docker-compose example and (I hate hate hate hate having to direct people to the code) but the test cases that validate the logic that decides who connects directly to who. E.g. the star test case is a good simple example of what happens.

      Database is kept on every node, whether they are the leader, a voter, or just an observer. Raft consensus is used to stream changes to everyone. When a leader drops, any of the other voters will jump into gear as the controller depending on the configured election timeout. If people connecting can't reach the leader directly - they'll get proxied to it via the peers they can reach.

      TailScale does a far better job on the hole punching side than I do currently. They employ all sorts of crazy magic - and I'm on the fence about just how much I want to implement into the core code base. I've added a plugin architecture to hopefully spread things out more. As it stands, you are shit out of luck if you can't at least get out on gRPC to ask to join. Then once you get over that hurdle - you are limited by either direct wireguard traffic or a hole punched ICE/STUN connection. Definitely lots of design to take place still in this area.

      Hope this answered your questions :)

      1 vote
      1. Earthboom
        Link Parent
        It did! Thanks very much. I'm pretty familiar with most of the moving pieces but never thought to combine them like you and tailscale have. It's quite a feat of engineering to have a mastery of...

        It did! Thanks very much. I'm pretty familiar with most of the moving pieces but never thought to combine them like you and tailscale have. It's quite a feat of engineering to have a mastery of networking on that level to then script all of it in a dynamic way so there's redundancy in the event of a lost connection (and even to create an adhoc proxy!) very cool stuff!

        I don't currently operate out of many networks, but I wonder if you've thought about the Azure / AWS? They have built in tools to have VPN routers for routing traffic across multiple networks so it's ubiquitous, so long as you stay in the cloud. Things fall apart and prices go up the moment you punch holes in their firewalls and VPN outside their environment. I like the way wireguard works as a pseudo VPN and a mesh system in the cloud would be sick. Access your on prem FS, access your VM FS in the cloud, and access your other cloud environment all from any device without needing to rdp into the cloud world.

        1 vote