9 votes

In which a foolish developer tries DevOps: critique my VPS provisioning script!

I'm attempting to provision two mirror staging and production environments for a future SaaS application that we're close to launching as a company, and I'd like to get some feedback on the provisioning script I've created that takes a default VPS from our hosting provider, DigitalOcean, and readies it for being a secure hosting environment for our application instance (which runs inside Docker, and persists data to an unrelated managed database).

I'm sticking with a simple infrastructure architecture at the moment: A single VPS which runs both nginx and the application instance inside a containerised docker service as mentioned earlier. There's no load balancers or server duplication at this point. @Emerald_Knight very kindly provided me in the Tildes Discord with some overall guidance about what to aim for when configuring a server (limit damage as best as possible, limit access when an attack occurs)—so I've tried to be thoughtful and integrate that paradigm where possible (disabling root login, etc).

I’m not a DevOps or sysadmin-oriented person by trade—I stick to programming most of the time—but this role falls to me as the technical person in this business; so the last few days has been a lot of reading and readying. I’ll run through the provisioning flow step by step. Oh, and for reference, Ubuntu 20.04 LTS.

First step is self-explanatory.

#!/bin/sh

# Name of the user to create and grant privileges to.
USERNAME_OF_ACCOUNT=

sudo apt-get -qq update
sudo apt install -qq --yes nginx
sudo systemctl restart nginx

Next, create my sudo user, add them to the groups needed, require a password change on first login, then copy across any provided authorised keys from the root user which you can configure to be seeded to the VPS in the DigitalOcean management console.

useradd --create-home --shell "/bin/bash" --groups sudo,www-data "${USERNAME_OF_ACCOUNT}"
passwd --delete $USERNAME_OF_ACCOUNT
chage --lastday 0 $USERNAME_OF_ACCOUNT

HOME_DIR="$(eval echo ~${USERNAME_OF_ACCOUNT})"
mkdir --parents "${HOME_DIR}/.ssh"
cp /root/.ssh/authorized_keys "${HOME_DIR}/.ssh"

chmod 700 ~/.ssh
chmod 600 ~/.ssh/authorized_keys
chown --recursive "${USERNAME_OF_ACCOUNT}":"${USERNAME_OF_ACCOUNT}" "${HOME_DIR}/.ssh"

sudo chmod 775 -R /var/www
sudo chown -R $USERNAME_OF_ACCOUNT /var/www
rm -rf /var/www/html

Installation of docker, and run it as a service, ensure the created user is added to the docker group.

sudo apt-get install -qq --yes \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg-agent \
    software-properties-common

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo apt-key fingerprint 0EBFCD88

sudo add-apt-repository --yes \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"

sudo apt-get -qq update
sudo apt install -qq --yes docker-ce docker-ce-cli containerd.io

# Only add a group if it does not exist
sudo getent group docker || sudo groupadd docker
sudo usermod -aG docker $USERNAME_OF_ACCOUNT

# Enable docker
sudo systemctl enable docker

sudo curl -L "https://github.com/docker/compose/releases/download/1.27.4/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
sudo ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose
docker-compose --version

Disable root logins and any form of password-based authentication by altering sshd_config.

sed -i '/^PermitRootLogin/s/yes/no/' /etc/ssh/sshd_config
sed -i '/^PasswordAuthentication/s/yes/no/' /etc/ssh/sshd_config
sed -i '/^ChallengeResponseAuthentication/s/yes/no/' /etc/ssh/sshd_config

Configure the firewall and fail2ban.

sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow ssh
sudo ufw allow http
sudo ufw allow https
sudo ufw reload
sudo ufw --force enable && sudo ufw status verbose

sudo apt-get -qq install --yes fail2ban
sudo systemctl enable fail2ban
sudo systemctl start fail2ban

Swapfiles.

sudo fallocate -l 1G /swapfile && ls -lh /swapfile
sudo chmod 0600 /swapfile && ls -lh /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile && sudo swapon --show
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab

Unattended updates, and restart the ssh daemon.

sudo apt install -qq unattended-upgrades
sudo systemctl restart ssh

Some questions

You can assume these questions are cost-benefit focused, i.e. is it worth my time to investigate this, versus something else that may have better gains given my limited time.

  1. Obviously, any critiques of the above provisioning process are appreciated—both on the micro level of criticising particular lines, or zooming out and saying “well why don’t you do this instead…”. I can’t know what I don’t know.

  2. Is it worth investigating tools such as ss or lynis (https://github.com/CISOfy/lynis) to perform server auditing? I don’t have to meet any compliance requirements at this point.

  3. Do I get any meaningful increase in security by implementing 2FA on login here using google authenticator? As far as I can see, as long as I'm using best practices to actually ssh into our boxes, then the likeliest risk profile for unwanted access probably isn’t via the authentication mechanism I use personally to access my servers.

  4. Am I missing anything here? Beyond the provisioning script itself, I adhere to best practices around storing and generating passwords and ssh keys.

Some notes and comments

  1. Eventually I'll use the hosting provider's API to spin up and spin down VPS's on the fly via a custom management application, which gives me an opportunity to programmatically execute the provisioning script above and run some over pre- and post-provisioning things, like deployment of the application and so forth.

  2. Usage alerts and monitoring is configured within DigitalOcean's console, and alerts are sent to our business' Slack for me to action as needed. Currently, I’m settling on the following alerts:
    1. Server CPU utilisation greater than 80% for 5 minutes.
    2. Server memory usage greater than 80% for 5 minutes.
    3. I’m also looking at setting up daily fail2ban status alerts if needed.

8 comments

  1. [4]
    vegai
    Link
    If you use bash for important stuff, go "secure mode", i.e. start your script with set -eu set -o pipefail This prevents a few potential fuckups. Run shellcheck on your scripts. This is a shell...
    • If you use bash for important stuff, go "secure mode", i.e. start your script with
    set -eu
    set -o pipefail
    

    This prevents a few potential fuckups.

    • Run shellcheck on your scripts. This is a shell script linter

    If you start getting more infrastructure, you might want to check out a more declarative form of scripting, like Ansible. If you write your Ansible scripts correctly, you can run them several times on your machines, and only the changes get applied. This is a much nicer experience in the long run.

    Do I get any meaningful increase in security by implementing 2FA on login here using google authenticator?

    2FA always gives you a nice improvement in security, so it kinda depends on how important you think security is. The threat here is probably that somebody gets hold of your private key, which is probably a bit easier than getting hold of your private key AND your phone without you noticing.

    8 votes
    1. [3]
      admicos
      Link Parent
      You can also use the #!/bin/bash -euo pipefail shebang to do all this in a single line. If you're concerned about POSIX support, -o pipefail is a bash-ism (IIRC) so you might want to avoid it.

      If you use bash for important stuff, go "secure mode", i.e. start your script with
      set -eu
      set -o pipefail

      You can also use the #!/bin/bash -euo pipefail shebang to do all this in a single line.
      If you're concerned about POSIX support, -o pipefail is a bash-ism (IIRC) so you might want to avoid it.

      3 votes
      1. [2]
        arp242
        Link Parent
        The problem with that is that typing bash script.bash won't pick up on that. I'd strongly recommend against using it, as changing behaviour depending on how you run the script is going to be very...

        The problem with that is that typing bash script.bash won't pick up on that. I'd strongly recommend against using it, as changing behaviour depending on how you run the script is going to be very confusing.

        3 votes
        1. admicos
          Link Parent
          Fair enough. I usually execute all my scripts directly (./script.sh) so I never experienced that.

          Fair enough. I usually execute all my scripts directly (./script.sh) so I never experienced that.

          1 vote
  2. arp242
    Link
    My general strategy with these kind of scripts is to make them repeatable: running it twice should work. The advantage of this is that if you add something later or if it errors out half-way you...

    My general strategy with these kind of scripts is to make them repeatable: running it twice should work. The advantage of this is that if you add something later or if it errors out half-way you can just run the script again without having to muck about or comment out selected lines.

    So, for example, instead of just useradd I'd do something like:

    grep -q "^:"${USERNAME_OF_ACCOUNT}" /etc/passwd || useradd [..]
    

    And instead of ln -s I'd use ln -sf so it won't error out if the link already exists (or updates it if the location changed). Some similar stuff applies to various other places.

    I'd also just run the entire thing as root instead of sudo anywhere. You can always su to another user from there if need be. It's just easier.

    I never understood why people use ALL_CAPS for variable names in scripts; no other language does that. It's a bit of a stylistic thing, but I only use ALL_CAPS for for environment variables rather local variables (similar to how ALL_CAPS is used for globals in C).

    Do I get any meaningful increase in security by implementing 2FA on login here
    using google authenticator? As far as I can see, as long as I'm using best
    practices to actually ssh into our boxes, then the likeliest risk profile for
    unwanted access probably isn’t via the authentication mechanism I use
    personally to access my servers.


    If you use a key with a passphrase then you already have 2FA: a passphrase you know and a key you have.

    7 votes
  3. spit-evil-olive-tips
    Link
    I'll second the recommendation for Ansible. There's a variety of config-management tools like it, Chef/Puppet/Ansible/Salt are probably the 4 most well-known ones, but I think Ansible is the...

    I'll second the recommendation for Ansible. There's a variety of config-management tools like it, Chef/Puppet/Ansible/Salt are probably the 4 most well-known ones, but I think Ansible is the easiest one to get started with if you've never used any of them.

    Here's a real-world example, this installs Docker and enables it, though for CentOS and not Ubuntu:

    ---
    - hosts: all
      become: yes
      tasks:
        - yum_repository:
            name: docker
            description: docker
            baseurl: https://download.docker.com/linux/centos/7/$basearch/stable
            gpgcheck: yes
            gpgkey: https://download.docker.com/linux/centos/gpg
    
        - yum:
            name: docker-ce-19.03.12-3.el7
    
        - systemd:
            name: docker
            state: started
            enabled: yes
    

    What Ansible does is turn each of those tasks into a small Python script, SCPs it to the target host, and executes it (or just executes it locally if you're running in "local" mode). Those scripts Ansible generates are carefully written to make sure they're no-ops if the thing they're trying to do is already done (the five-dollar word for this is idempotent).

    The big benefit of this comes when you try to start upgrading. Say you're running Docker version X, and want to upgrade to Y. You update your provisioning script to install version Y on newly-provisioned machines. Can you re-run that provisioning script on an already-running machine and have it upgrade Docker? Are you confident that doing so won't have any adverse effects? Tools like Ansible handle that for you.

    When you get to the point of wanting to spin up VMs automatically in response to demand, take a look at Packer, which can provision DigitalOcean VM templates, and can provision them using Ansible. That means, rather than starting up a bare Ubuntu VM and doing all this provisioning "live" (while your app is under load and you want the new server to start serving traffic ASAP) you can do this provisioning ahead of time, save it to a template, then just have DO launch that template. If you want, you could even include that in your CI process, so that each "release" includes a pre-built VM template with your service already bundled, that can start serving real traffic as soon as it's launched.

    Various other feedback:

    I wouldn't recommend turning on unattended upgrades. For one thing, the upgrade schedule is typically randomized, to avoid every machine running auto-upgrades waking up at midnight Saturday or whatever and hammering the Ubuntu update server. So it's a coin flip whether your production server will get auto-upgraded before the staging server. And with a single production server, you're signing up for possible unscheduled downtime when the upgrade happens.

    The standard SSH hardening you've done (disabling root, disabling passwords, running fail2ban) is very likely sufficient, and I don't think you need Google Authenticator for 2FA. However, make sure you enable 2FA for things like your Digital Ocean login.

    Is your app a single Docker container? If so I think docker-compose is a bit overkill. You can just write a systemd service that does docker run for you.

    5 votes
  4. tomf
    Link
    Some will disagree, but I like to change the default SSH port. Especially with f2b, your logs will be cleaner since you won't have all of the lamers who look for 22 throwing errors.

    Some will disagree, but I like to change the default SSH port. Especially with f2b, your logs will be cleaner since you won't have all of the lamers who look for 22 throwing errors.

    3 votes