In which a foolish developer tries DevOps: critique my VPS provisioning script!
I'm attempting to provision two mirror staging and production environments for a future SaaS application that we're close to launching as a company, and I'd like to get some feedback on the provisioning script I've created that takes a default VPS from our hosting provider, DigitalOcean, and readies it for being a secure hosting environment for our application instance (which runs inside Docker, and persists data to an unrelated managed database).
I'm sticking with a simple infrastructure architecture at the moment: A single VPS which runs both nginx and the application instance inside a containerised docker service as mentioned earlier. There's no load balancers or server duplication at this point. @Emerald_Knight very kindly provided me in the Tildes Discord with some overall guidance about what to aim for when configuring a server (limit damage as best as possible, limit access when an attack occurs)—so I've tried to be thoughtful and integrate that paradigm where possible (disabling root login, etc).
I’m not a DevOps or sysadmin-oriented person by trade—I stick to programming most of the time—but this role falls to me as the technical person in this business; so the last few days has been a lot of reading and readying. I’ll run through the provisioning flow step by step. Oh, and for reference, Ubuntu 20.04 LTS.
First step is self-explanatory.
#!/bin/sh
# Name of the user to create and grant privileges to.
USERNAME_OF_ACCOUNT=
sudo apt-get -qq update
sudo apt install -qq --yes nginx
sudo systemctl restart nginx
Next, create my sudo user, add them to the groups needed, require a password change on first login, then copy across any provided authorised keys from the root user which you can configure to be seeded to the VPS in the DigitalOcean management console.
useradd --create-home --shell "/bin/bash" --groups sudo,www-data "${USERNAME_OF_ACCOUNT}"
passwd --delete $USERNAME_OF_ACCOUNT
chage --lastday 0 $USERNAME_OF_ACCOUNT
HOME_DIR="$(eval echo ~${USERNAME_OF_ACCOUNT})"
mkdir --parents "${HOME_DIR}/.ssh"
cp /root/.ssh/authorized_keys "${HOME_DIR}/.ssh"
chmod 700 ~/.ssh
chmod 600 ~/.ssh/authorized_keys
chown --recursive "${USERNAME_OF_ACCOUNT}":"${USERNAME_OF_ACCOUNT}" "${HOME_DIR}/.ssh"
sudo chmod 775 -R /var/www
sudo chown -R $USERNAME_OF_ACCOUNT /var/www
rm -rf /var/www/html
Installation of docker, and run it as a service, ensure the created user is added to the docker group.
sudo apt-get install -qq --yes \
apt-transport-https \
ca-certificates \
curl \
gnupg-agent \
software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo apt-key fingerprint 0EBFCD88
sudo add-apt-repository --yes \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"
sudo apt-get -qq update
sudo apt install -qq --yes docker-ce docker-ce-cli containerd.io
# Only add a group if it does not exist
sudo getent group docker || sudo groupadd docker
sudo usermod -aG docker $USERNAME_OF_ACCOUNT
# Enable docker
sudo systemctl enable docker
sudo curl -L "https://github.com/docker/compose/releases/download/1.27.4/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
sudo ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose
docker-compose --version
Disable root logins and any form of password-based authentication by altering sshd_config
.
sed -i '/^PermitRootLogin/s/yes/no/' /etc/ssh/sshd_config
sed -i '/^PasswordAuthentication/s/yes/no/' /etc/ssh/sshd_config
sed -i '/^ChallengeResponseAuthentication/s/yes/no/' /etc/ssh/sshd_config
Configure the firewall and fail2ban.
sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow ssh
sudo ufw allow http
sudo ufw allow https
sudo ufw reload
sudo ufw --force enable && sudo ufw status verbose
sudo apt-get -qq install --yes fail2ban
sudo systemctl enable fail2ban
sudo systemctl start fail2ban
Swapfiles.
sudo fallocate -l 1G /swapfile && ls -lh /swapfile
sudo chmod 0600 /swapfile && ls -lh /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile && sudo swapon --show
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
Unattended updates, and restart the ssh daemon.
sudo apt install -qq unattended-upgrades
sudo systemctl restart ssh
Some questions
You can assume these questions are cost-benefit focused, i.e. is it worth my time to investigate this, versus something else that may have better gains given my limited time.
- Obviously, any critiques of the above provisioning process are appreciated—both on the micro level of criticising particular lines, or zooming out and saying “well why don’t you do this instead…”. I can’t know what I don’t know.
- Is it worth investigating tools such as
ss
orlynis
(https://github.com/CISOfy/lynis) to perform server auditing? I don’t have to meet any compliance requirements at this point. - Do I get any meaningful increase in security by implementing 2FA on login here using google authenticator? As far as I can see, as long as I'm using best practices to actually
ssh
into our boxes, then the likeliest risk profile for unwanted access probably isn’t via the authentication mechanism I use personally to access my servers. - Am I missing anything here? Beyond the provisioning script itself, I adhere to best practices around storing and generating passwords and ssh keys.
Some notes and comments
- Eventually I'll use the hosting provider's API to spin up and spin down VPS's on the fly via a custom management application, which gives me an opportunity to programmatically execute the provisioning script above and run some over pre- and post-provisioning things, like deployment of the application and so forth.
- Usage alerts and monitoring is configured within DigitalOcean's console, and alerts are sent to our business' Slack for me to action as needed. Currently, I’m settling on the following alerts:
- Server CPU utilisation greater than 80% for 5 minutes.
- Server memory usage greater than 80% for 5 minutes.
- I’m also looking at setting up daily fail2ban status alerts if needed.
You can also use the
#!/bin/bash -euo pipefail
shebang to do all this in a single line.If you're concerned about POSIX support,
-o pipefail
is a bash-ism (IIRC) so you might want to avoid it.The problem with that is that typing
bash script.bash
won't pick up on that. I'd strongly recommend against using it, as changing behaviour depending on how you run the script is going to be very confusing.Fair enough. I usually execute all my scripts directly (
./script.sh
) so I never experienced that.My general strategy with these kind of scripts is to make them repeatable: running it twice should work. The advantage of this is that if you add something later or if it errors out half-way you can just run the script again without having to muck about or comment out selected lines.
So, for example, instead of just
useradd
I'd do something like:And instead of
ln -s
I'd useln -sf
so it won't error out if the link already exists (or updates it if the location changed). Some similar stuff applies to various other places.I'd also just run the entire thing as root instead of
sudo
anywhere. You can alwayssu
to another user from there if need be. It's just easier.I never understood why people use ALL_CAPS for variable names in scripts; no other language does that. It's a bit of a stylistic thing, but I only use ALL_CAPS for for environment variables rather local variables (similar to how ALL_CAPS is used for globals in C).
If you use a key with a passphrase then you already have 2FA: a passphrase you know and a key you have.
I'll second the recommendation for Ansible. There's a variety of config-management tools like it, Chef/Puppet/Ansible/Salt are probably the 4 most well-known ones, but I think Ansible is the easiest one to get started with if you've never used any of them.
Here's a real-world example, this installs Docker and enables it, though for CentOS and not Ubuntu:
What Ansible does is turn each of those tasks into a small Python script, SCPs it to the target host, and executes it (or just executes it locally if you're running in "local" mode). Those scripts Ansible generates are carefully written to make sure they're no-ops if the thing they're trying to do is already done (the five-dollar word for this is idempotent).
The big benefit of this comes when you try to start upgrading. Say you're running Docker version X, and want to upgrade to Y. You update your provisioning script to install version Y on newly-provisioned machines. Can you re-run that provisioning script on an already-running machine and have it upgrade Docker? Are you confident that doing so won't have any adverse effects? Tools like Ansible handle that for you.
When you get to the point of wanting to spin up VMs automatically in response to demand, take a look at Packer, which can provision DigitalOcean VM templates, and can provision them using Ansible. That means, rather than starting up a bare Ubuntu VM and doing all this provisioning "live" (while your app is under load and you want the new server to start serving traffic ASAP) you can do this provisioning ahead of time, save it to a template, then just have DO launch that template. If you want, you could even include that in your CI process, so that each "release" includes a pre-built VM template with your service already bundled, that can start serving real traffic as soon as it's launched.
Various other feedback:
I wouldn't recommend turning on unattended upgrades. For one thing, the upgrade schedule is typically randomized, to avoid every machine running auto-upgrades waking up at midnight Saturday or whatever and hammering the Ubuntu update server. So it's a coin flip whether your production server will get auto-upgraded before the staging server. And with a single production server, you're signing up for possible unscheduled downtime when the upgrade happens.
The standard SSH hardening you've done (disabling root, disabling passwords, running fail2ban) is very likely sufficient, and I don't think you need Google Authenticator for 2FA. However, make sure you enable 2FA for things like your Digital Ocean login.
Is your app a single Docker container? If so I think docker-compose is a bit overkill. You can just write a systemd service that does
docker run
for you.Some will disagree, but I like to change the default SSH port. Especially with f2b, your logs will be cleaner since you won't have all of the lamers who look for 22 throwing errors.
No xclip?