Home-lab set-up ... Docker vs native servers? Pros and cons of each?
And as long as I'm asking ... nginx or Apache (or Caddy or whatever else you think is best).
I'm hosting a few web sites and services, but currently, everything is "out there" on VPSes. I want to bring it all in-house, go back to the old days of actually hosting websites out of my living room.
Towards that end, I am gradually upgrading and overhauling all the sites and services, fixing long-standing issues and inefficiencies in the config files, merging servers, etc.
I have never learned Docker. I've started to several times, worked with it a bit on a job once, used it a bit here and there; so I'm not clueless, but it would be a learning curve.
Also, I'm running one main service (Nextcloud) that officially, only supports Apache -- there absolutely are nginx setup guidelines and tutorials and such, but that's all unofficial, experimental setups.
And I'm running another major service (Synapse), on nginx.
And I want to merge the servers, and choose one web server to host both of them, and I don't know which way to go there.
Thanks for any feedback.
I've been using docker for my homelab for years. The biggest advantage for my single server setup is how quickly I can spin up and down services as needed. The portable encapsulated nature of it allows for stable environments that are all via configuration. If something catastrophic happens all of my configurations for my many services all are in a git repo and safe. I don't think I'll ever go back at this point, it just is too easy to forget what all your changes were when dealing with a live system.
Yeah, encapsulation is the key as far as I’m concerned. Each service is its own “thing” - you can configure them separately, start/stop them separately, update the host OS separately, back up state separately, upgrade them separately, move them between machines separately.
There can sometimes be good reasons to want to run specific workloads on bare metal, but for most standard stuff it feels quite messy and brittle for everything on a given server to be tightly coupled to that machine’s hardware and OS configuration, and affected by other unrelated services just because they happen to be on the same system.
It doesn’t even specifically have to be docker, but running without some kind of per-service container/VM/isolated runtime/etc. has the same feeling for me as writing a piece of code in a single giant procedural block with a bunch of global variables. You can do that, but you’ll be much happier if things are nicely wrapped up in manageable chunks with good separation of concerns.
[Edit] And just to be explicit, this also covers the apache vs nginx question: nextcloud container running apache, synapse container running nginx, and no need to worry about wedging one or other into a configuration where it doesn’t quite fit.
What happens to databases? Synapse (eg) uses a db. Is the db also somehow virtualized into the Docker container? Can I still just log into the db and start throwing SQL queries at it?
And, actually, what about the website files, too? Is there still a proper, accessible site-root folder and subfile structure, where I can pop in and edit webpages and what-not?
This has always been my subconscious concern about Docker, that somewhere in the containerization process, the service gets bundled away in some fashion that makes it more difficult (or impossible) for me to granularly muck about inside the service's files.
Docker is, fundamentally, just a way to build reproducible environments for virtual machines. There is a lot more complexity you can use, and likely will learn as you learn docker, but the above statement is the only bit of knowledge you actually need.
The pedants in the audience are dying to day docker isn’t actually a virtual machine. This is true, since it shares the kernel with the host system, but it doesn’t actually make a practical difference in day to day usage.
With a properly set up docker service, nothing of value is ever stored in the container. The container files are not immutable, but the individual container can be destroyed at any time and replaced from scratch. So everything that is important, config files, database data directories, etc, is stored either on the host operating system’s file system, or in a docker volume. You can think of a volume like an iso file that just lives on the host file system. There are some reasons to choose volumes over the host file system, but mostly you can ignore volumes.
In order to make a docker container, you have to have a
Dockerfile
. Here is an example Dockerfile from my audiobookcovers.com project.This is a multistage build, so it may look intimidating, but it’s actually pretty simple. Here is a human language version of what this does:
Make a new virtual machine and call it base. Make it based on the public node:24-slim image. That is basically just debian with node preinstalled.
set some environment variables needed for pnpm.
cd into the /app directory
Copy the listed files into the app directory. This copies from the file system on your host computer by default, so it takes the files from the project directory.
Now make a new virtual machine, based on the base VM we just created.
Run the pnpm install —prod —frozen-lockfile. The extra part just makes docker caching better, but is not required.
Now make a new VM, based on base, called build
Run pnpm install —frozen-lockfile
Copy all local project files into the VM
Run pnpm run build, which builds the final server files for the service
Now make a new VM from base. This one doesn’t get a specific name, since it is the output VM.
Copy all local project files to the VM
Copy the installed production node dependencies from that virtual machine we already built.
Copy the final build directory from the builder VM
Expose port 3000 and tell docker what command to run when starting the container.
When you set up a new server, you already have to do these things. Docker has a different syntax from the command line, but you still need to perform the same actions. If you have your server setup documented already, you can probably transition it to a dockerfile easily.
When you run
docker build -t audiobookcovers
, docker will run all these commands and save the file system of the final VM. Now you can rundocker run -p 3000:3000 audiobookcovers
, and the server is running. The -p command tells docker to connect the container port 3000 to your local system port 3000. Want to « ssh » into the VM? In another terminal, rundocker exec -it audiobookcovers /bin/bash
. The exec command runs the command you give it in the container. The -it option gives you an interactive terminal. Voila !For this project, it doesn’t need any specific runtime config or local storage. That happens in another database server. But if you do need local storage, you can add something like this to the command:
-v /home/server/config:/config -v /home/server/data:/data
. This just mounts your local folders in the home directory into the docker image. If this was, for example, a sql server, those folders would contain the data and config files. Every time you start, the container may be different, and the server would think it is starting for the first time. But it picks up the config and data without an issue. So if something goes wrong with the container, nuke it from orbit and make a new one. This is what makes docker good.If you know how to manage virtual machines, you already know how to manage docker. You just have to learn the slight syntax changes.
Databases can be ran in their own containers too. You can access the container's shell easily, and do whatever you want inside. The popular databases have their official images available on docker hub. With port binding, you can connect any tool locally, whether it's a GUI or CLI, to the container.
Personally, I always use docker images to spin up databases for my development needs. It's so much less hassle.
There's a nice guide on working with databases here.
Regarding website files, of course you could do that, but I feel it's better to containerize the app again if you need to do changes. There's also an option to bind a volume from a local directory to the container's file system, then you can locally change the files and the changes will be reflected in the container.
I like to think of a single container as equivalent to a single VPS or cloud VM. You can either have your DB and web server for Synapse running in a single container together, or run separate
synapse-web
andsynapse-db
containers (ideally managed together with docker compose), or if you want potentially even runsynapse-web
as one container and a separate general DB container if you had a reason to want a single MySQL or Postgres server that hosts separate databases for multiple other services.If it were me I’d go for the second option, because I strongly prefer modularity and separation where possible, but there are arguments to be made for all three!
In terms of file structure, you can easily mount paths on the host machine at arbitrary other paths on the container - so for things like the actual website files and (generally, but not always) service configuration those will just exist as normal on your host machine and be shared into the container.
For files that make up the actual service stack - source files for the service itself, dependencies, binaries, etc. - those will be part of the container image. Those can’t be easily edited in an ad hoc way, but I see that as containerisation enforcing good practice: you shouldn’t be ad hoc editing the code, you want to be properly versioning that and deploying a new container image as needed.
It's possible but heavily discouraged to run multiple processes in the same docker container.
Instead of container == VPS, I would say container == service.
Maybe those are similar to you but I treat long-living VPSs a lot differently than I treat cloud VMs (even though they are essentially the same thing--I would put all my services on the same VPS but with clouds like AWS/GCP I usually keep services more isolated from each other)
Yeah, this is good advice!
Honestly I was probably unintentionally muddying things a little there - I was going for completeness, because there are more unusual use cases like distrobox where the one container per service approach tends to bend a little just based on the way it’s designed, and some situations I’ve run into semi-regularly in the past where specific hardware passthrough to containers makes it necessary to allocate more than one process to a single one for things to work properly.
But that’s well into edge case territory, so it’s more helpful to focus on the primary thing you should do than the things you technically can do.
I'm pretty sure most DBs have container images available. I know for sure MariaDB, SQL Server, Postgres, and Mongo do. Set up correctly, you can connect to a containerized DB just like you would a DB running natively and you wouldn't even notice the difference.
You can access a shell in a container and get access to the files within the container. Depending on what the container image is based on, you may have utilities like Nano. This wouldn't really be the best way to edit a site though. Containers are meant to be immutable so that you can destroy the container and recreate from an image and get the same thing each time. You should probably stick with editing the site's source code and building a new image for changes*. You would then pull the image and destroy the container and deploy a new one based off of the new image. For some things like config tweaks you would map a directory to the container and edit files there from the host system.
* If you're building your own custom images for your own services you will probably want to also run your own container registry. I use Forgejo (a fork of Gitea) for this but there are other options too. Naturally, this is running in yet another Docker container.
Car analogy time!
If the application is a car, the container is the engine. You can swap the engine to a newer one, but the seats and frame (your data) stay in the same place. Even if the engine breaks down, you can still just open the door and get your stuff from the back seat and glove box.
It's considered very bad form to create docker images that store data inside the container, some even go as far as mounting the whole filesystem inside the container as read only so nothing can even accidentally create anything in there.
There is a term Code Smell in programming circles.
I consider it an "application smell" if it can't be containerised in any way and "must" be run in bare metal. There are some fundamental design errors in there I don't want to deal with at all.
Yes, there are exceptions to this, but they are few and far between. Not something a regular selfhoster should encounter more than once. Home Assistant is on the edge of this, some say it's "better" as a VM. I've run it in Docker but encountered one env where I couldn't get it to work in anything except a VM due to networking weirdness. (A Synology NAS)
I can give a big plus one to HomeAssistant running better in a VM. I think it’s just because it is the only foss project that is targeting users who don’t yet have the understanding of Docker. Basically all self hosted type software assume some level of docker proficiency, even if it’s just enough to launch an image with portainer. HAOS install works so well because they handle everything that the user has to handle in a standard docker style deployment. Need an mqtt server? Click the install button. Want zigbee2mqtt? Click the install button. I think it is theoretically possible to do a similar setup with bare docker through passing the docker socket or docker in docker, but that is likely more complicated than just an OS level install. Basic users won’t need other docker containers, and advanced users can use a VM or manage their own docker containers.
I like not dealing with another layer in my setups, so I’ve chosen to do native. I never run into issues of services biting each other. Major OS upgrades are usually a good time to reevaluate my setup, go to new major versions of the services I’m running and revisit the configurations. I keep a git repo of config files as a backup, without secrets.
I have (checks Unraid dashboard) 57 docker containers running at the moment.
I can't fathom the amount of work or the massive server capacity required to run them all on a VM with separate memory, disk and CPU resources.
Never mind having to do 57 OS updates regularly.
Now I can just either have watchtower update them automatically or click one button in the UI and it'll either update one or all of them,.
Not sure why you would do that. The alternative is just running them on a single machine.
And that's exactly what I don't want to do and would kinda defeat the purpose of having a fancy VM environment? It's like buying a Ford F150 and just using it to drive to your office job from the suburbs and back :D
Way too many applications don't work too well with others, hard coded ports and configuration locations and specific needs for python libraries.
It's just so much simpler to put each of them in a container separately so they can have whatever port they want and install specific versions of libraries to their hearts content. With Docker I can adjust the actual port without having to fight the application.
Look, I don’t know your setup and you don’t know mine. But I don’t quite agree with your arguments.
VMs make the life of the host easier. They share hardware between multiple VMs and managing them is a lot easier than managing hardware for individual customers. From your perspective as a user of the VM, I don’t really see much of a benefit besides not having to deal with OS maintenance. From that standpoint, your analogy with the truck makes little sense to me.
I too run multiple python services. With pyenv I don’t run into any dependency issues. I’m curious why that wouldn’t work for you.
Yes, ports need to be selected. Not sure how that wouldn’t be configurable, but clearly you have applications that don’t allow it.
Again, I don’t know what you are running and you probably have a good reason to need containers. For me it has never been a necessity and I can configure applications to run next to each other without problems.
There are two almost separate parts of docker. One is using prebuilt docker images and running them on your own server. The other is actually building your own docker images. From your post, I am not quite sure which you are going for.
If you already know how to set up Apache/whatever other hosting software you use on a VM, using prebuilt images doesn’t have much advantage for you. In this context, the main advantage is for the software maintainer. They don’t have to worry about if your OS bundles python 3.12 instead of 3.11 which breaks their software or something. The OS can have whatever python it wants, and the docker image has its own version of python. The only OS requirement for a docker image is that it runs a Linux kernel and has a docker compatible runtime installed. It does give you a small advantage vs splitting up a homelab server into VMs: resource sharing. With a docker setup, a single docker container can use exactly and only the resources it needs. With VMs, you have to decide ahead of time memory and cpu resource limits. This is the major factor for why I like using docker. That, and many homelab softwares are already distributed as docker images.
If you are planning to build your own images, there are more substantial benefits of docker. The biggest is that you don’t need to care what your target server looks like. This service works best with Debian 12? Easy, just start with « FROM debian-12 ». Other service only works on alpine? « FROM alpine ». And now you can easily run them all on the exact same computer without any issue. Want to change the host OS on your server? As long as it supports docker, it will work just fine. Want to not have to worry about OS hardening or security patches? Install flatcar Linux or a similar OS that is pre-hardened. Your existing docker images will work just fine because they don’t depend on anything from the server OS.
Whatever you do, I highly recommend you learn docker compose. I personally have a problem where I forget how I set up servers and then have to relearn when I need to maintain them. Docker compose and docker files makes all configurations a file on disk, so I don’t need to remember anything.
One thing that might help is docker compose. You define all your services and start/stop them at the same time.
It's a lot easier than typing in long docker commands manually. Another thing that might help is Dockge or Portainer which are GUIs for docker
I would try to keep apps like this isolated as much as possible. From a webdev perspective nginx has a lot more modern and sane configuration. I would run Nextcloud in docker with the recommended software (Apache if so) and do a reverse proxy with nginx, if necessary. Same with Synapse.
If you're going through the trouble of documenting all the setup commands in VCS (which you should do regardless) it's honestly not that much more work to deal with mountpoints (or docker volumes) and container networking. The main benefit is that it feels a lot more automated and it can be a bit more secure (though if you use SELinux there might be some additional head-scratching every now and then [even with podman instead of docker]).
Running stuff natively, without docker, can be a bit easier (when things don't have authoritative docker images already). Things also run a little bit faster. If you are running a load balancer which is moving a lot of traffic I would keep that as its own machine (and also outside of docker). But with real-world application scenarios where you only have tens or hundreds of concurrent visitors per minute you're not going to notice the performance difference much.
Before Docker every time I do a major OS upgrade it becomes an issue, especially if you try softwares that are not coming from the OS package manager. After Docker the OS is just the platform for running Docker itself and it become a lot safer to upgrade. The only problem is that by default when you
apt-get upgrade
it upgrades Docker (as it is just another package) and all of your containers get forcefully terminated while the Docker daemon restarts. I'm not sure if persisting container is out of beta, but it is not currently on by default in latest DebianI use Docker within a Debian VM in Proxmox. The VM is because it's dead simple to do regular backups of the whole VM then restore it if I somehow really break things.
Docker all the way when you can. It's super easy to config and spin up new services. If you need multiple instances of the same service you just copy your config and tweak it to point at different volumes and use different ports.
I use Caddy for my reverse proxy. Each entry only needs a few lines but lately I've been finding it to be almost too simple. I used Nginx Proxy Manager in the past but switched when development was stagnant. I believe there's an active fork now. I briefly checked out SWAG but didn't like how much configuration a simple service could require. If I were starting now I'd seriously look at Traefik and I may end up switching to it anyway. A lot of projects can integrate nicely with Traefik through its Docker label system.
I use Adguard for ad blocking and local DNS resolution. I used to use PiHole but switched because PiHole's local DNS resolution was a bit clunky at the time.
Edit: As for Nextcloud, they have what they call an AIO (all in one) Docker image. With a solid understanding of Docker it was pretty simple to set up. Yes, it technically uses Apache but I still have that sitting behind my Caddy reverse proxy
+1 for Traefik! My environment is predominantly Docker containers with a few VMs and baremetal hosts, and Traefik handles it all with ease. The ability to use labels on containers to define the parameters to Traefik is so nice, it's the best of SWAG (or whatever it used to be called before they got told to change it) but requires none of the complexity if you don't want it. If you put everything behind something like Authelia, you can easily define that in your entry points in Traefik and have reverse proxy enabled in as little as one traefik.enable label. Or you can go wild and manually tweak every little service like you'd expect with all the conf files SWAG gives you (assuming that's still how it works, I've been on Traefik for like five years now)
Backups are easy if the containers are well built. Basically everything that changes is outside of the container, then you mount a separate directory for it. Usually config/ or something.
Then you back that up along with the compose.yml file.
Depending on the service restoring from backups can be just copying the compose file over along with the config and saying
docker compose up -d
My setup is proxmox on a used HP workstation with a Xeon CPU, 64GB RAM (I have another 64GB but didn’t install it as power isn’t cheap where I live), 2x8TB (zfs mirror) for storage, and 250 GB SSD for the OS.
On proxmox, I have lxd containers running various things that I configure with andible. I manually provision the containers because I couldn’t find a good way to do it with terraform. Same for VMs.
I have a large VM that runs docker and portainer for more and more things. Portainer has annoying bugs and mostly works but I would replace it if I knew something better. They give a free licence for hobbyists.
I use traefik which took quite some effort to setup because of their lacklustre docs. Nothing is publicly accessible (except paperless generated URLs when I share documents), all the rest is local only and I use WireGuard on my devices to access.
Main problem for me is backups. It’s very clunky with docker. Each “app” is basically a docker compose of each component plus a backup service that runs docker and executes commands in the containers. Then puts the backup files at a known destination and the proxmox server is configured to back all that up once a day to rsync.net with borgmatic. And I use healthchecks.io to get alerts when the backups didn’t run.
It works but I don’t love the cludgy backup system. i couldn’t find a better way since postgres for example needs commands run to take a consistent dump to backup. I also have little observability into the services and struggle with monitoring. And provisioning new VMs or containers also must be done manually.
Why not just bind mount the data to the main VM filesystem? This seems unnecessarily complicated.
Because most applications have a way to dump data for backup. And for psql you can’t just copy the db files, you have to actually take a snapshot which is only possible via a psql tool pgdump. You can roll the dice and just copy the db files without doing a pgdump, and it will work to restore from these, until it won’t.
Also, borg’s philosophy isnt to dump entire disks but just the actual files you need to reconfigure the service/machine.
FYI if you are in the US, you likely have free access to LinkedIn Learning via your library, which has a whole course on Docker that IIRC will even give you a certificate. And even then, Docker is simple enough to learn that I'm not certain that you even need to take such a course.