Activity

Votes

Comments

New

All activity

Showing only topics in ~comp with the tag "devops". Back to normal view / Search all groups

OpenTofu denies Hashicorp's code-stealing accusations
- open source
Article 660 words
1 comment

devops.com

April 11

18 votes
How I keep myself alive using Golang

Link

1 comment

bytesizego.com

March 5

22 votes
Welcome to the Party, BizDevOps! an explainer.

Article 1593 words

4 comments

functionize.com

February 14

4 votes
Retrospective on infrastructure decisions after 4 years running infrastructure at a startup

Article 3607 words, published Feb 1 2024

1 comment

cep.dev

February 12

10 votes
Pipelight: Automation pipelines but easier

Link

3 comments

pipelight.dev

January 3

10 votes
A few easy linux commands, and a real-world example on how to use them in a pinch
- linux
Text 936 words
This below is a summary of some real-world performance investigation I recently went through. The tools I used are installed on all linux systems, but I know some people don't know them and would...

This below is a summary of some real-world performance investigation I recently went through. The tools I used are installed on all linux systems, but I know some people don't know them and would straight up jump to heavyweight log analysis services and what not, or writing their own solution.

Let's say you have request log sampling in a bunch of log files that contain lines like these:

127.0.0.1 [2021-05-27 23:28:34.460] "GET /static/images/flags/2/54@3x.webp HTTP/2" 200 1806 TLSv1.3 HIT-CLUSTER SessionID:(null) Cache:max-age=31536000
127.0.0.1 [2021-05-27 23:51:22.019] "GET /pl/player/123456/changelog/ HTTP/1.1" 200 16524 TLSv1.2 MISS-CLUSTER SessionID:(null) Cache:

You might recognize Fastly logs there (IP anonymized). Now, there's a lot you might care about in this log file, but in my case, I wanted to get a breakdown of hits vs misses by URL.

So, first step, let's concatenate all the log files with cat *.log > all.txt, so we can work off a single file.

Then, let's split the file in two: hits and misses. There are a few different values for them, the majority are covered by either HIT-CLUSTER or MISS-CLUSTER. We can do this by just grepping for them like so:
```
grep HIT-CLUSTER all.txt > hits.txt; grep MISS-CLUSTER all.txt > misses.txt
```
However, we only care about url and whether it's a hit or a miss. So let's clean up those hits and misses with cut. The way cut works, it takes a delimiter (-d) and cuts the input based on that; you then give it a range of "fields" (-f) that you want.

In our case, if we cut based on spaces, we end up with for example: 127.0.0.1 [2021-05-27 23:28:34.460] "GET /static/images/flags/2/54@3x.webp HTTP/2" 200 1806 TLSv1.3 HIT-CLUSTER SessionID:(null) Cache:max-age=31536000.

We care about the 5th value only. So let's do: cut -d" " -f5 to get that. We will also sort the result, because future operations will require us to work on a sorted list of values.
```
cut -d" " -f5 hits.txt | sort > hits-sorted.txt; cut -d" " -f5 misses.txt | sort > misses-sorted.txt
```
Now we can start doing some neat stuff. wc (wordcount) is an awesome utility, it lets you count characters, words or lines very easily. wc -l counts lines in an input, since we're operating with one value per line we can easily count our hits and misses already:
```
$ wc -l hits-sorted.txt misses-sorted.txt
  132523 hits-sorted.txt
  220779 misses-sorted.txt
  353302 total
```
220779 / 132523 is a 1:1.66 ratio of hits to misses. That's not great…

Alright, now I'm also interested in how many unique URLs are hit versus missed. uniq tool deduplicates immediate sequences, so the input has to be sorted in order to deduplicate our entire file. We already did that. We can now count our urls with uniq < hits-sorted.txt | wc -l; uniq < misses-sorted.txt | wc -l. We get 49778 and 201178, respectively. It's to be expected that most of our cache misses would be in "rarer" urls; this gives us a 1:4 ratio of cached to uncached URL.

Let's say we want to dig down further into which URLs are most often hitting the cache, specifically. We can add -c to uniq in order to get a duplicate count in front of our URLs. To get the top ones at the top, we can then use sort, in reverse sort mode (-r), and it also needs to be numeric sort, not alphabetic (-n). head lets us get the top 10.
```
$ uniq -c < hits-sorted.txt | sort -nr | head
    815 /static/app/webfonts/fa-solid-900.woff2?d720146f1999
    793 /static/app/images/1.png
    786 /static/app/fonts/nunito-v9-latin-ext_latin-regular.woff2?d720146f1999
    760 /static/CACHE/js/output.cee5c4089626.js
    758 /static/images/crest/3/light/notfound.png
    757 /static/CACHE/css/output.4f2b59394c83.css
    756 /static/app/webfonts/fa-regular-400.woff2?d720146f1999
    754 /static/app/css/images/loading.gif?d720146f1999
    750 /static/app/css/images/prev.png?d720146f1999
    745 /static/app/css/images/next.png?d720146f1999
```
And same for misses:
```
$ uniq -c < misses-sorted.txt | sort -nr | head
     56 /
     14 /player/237678/
     13 /players/
     12 /teams/
     11 /players/top/
<snip>
```
So far this tells us static files are most often hit, and for misses it also tells us… something, but we can't quite track it down yet (and we won't, not in this post). We're not adjusting for how often the page is hit as a whole, this is still just high-level analysis.

One last thing I want to show you! Let's take everything we learned and analyze those URLs by prefix instead. We can cut our URLs again by slash with cut -d"/". If we want the first prefix, we can do -f1-2, or -f1-3 for the first two prefixes. Let's look!
```
cut -d'/' -f1-2 < hits-sorted.txt | uniq -c | sort -nr | head
 100189 /static
   5948 /es
   3069 /player
   2480 /fr
   2476 /es-mx
   2295 /pt-br
   2094 /tr
   1939 /it
   1692 /ru
   1626 /de
```
```
cut -d'/' -f1-2 < misses-sorted.txt | uniq -c | sort -nr | head
  66132 /static
  18578 /es
  17448 /player
  17064 /tr
  11379 /fr
   9624 /pt-br
   8730 /es-mx
   7993 /ru
   7689 /zh-hant
   7441 /it
```
This gives us hit-miss ratios by prefix. Neat, huh?
0 comments

Adys

May 29, 2021

13 votes

NixOS Configuration for a VPS

linux

Text 1419 words

Since I took so long to reply to Tips to use NixOS on a server? by @simao, I decided to create a new topic to share my configs. Hopefully this is informative for anyone looking to do similar...

Since I took so long to reply to Tips to use NixOS on a server? by @simao, I decided to create a new topic to share my configs. Hopefully this is informative for anyone looking to do similar things - I'll also gladly take critiques, since my setup is probably not perfect.

First, I will share the output of 'lsblk' on my VPS:

NAME      MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
vda       253:0    0   180G  0 disk  
├─vda1    253:1    0   512M  0 part  /boot
└─vda2    253:2    0 179.5G  0 part  
  └─crypt 254:0    0 179.5G  0 crypt

That is, I use an unencrypted /boot partition, vda1, with GRUB 2 to prompt for a passphrase during boot, to unlock the LUKS encrypted vda2. I prefer to use ZFS as my file system for the encrypted drive, and LUKS rather than ZFS encryption. This is an MBR drive, since that's what my VPS provider uses, though UEFI would look the same. The particular way I do this also requires access through the provider's tools, and not ssh or similar. The hardware-configuration.nix file reflects this:

Click to view the hardware configuration file

# Do not modify this file!  It was generated by ‘nixos-generate-config’
# and may be overwritten by future invocations.  Please make changes
# to /etc/nixos/configuration.nix instead.
{ config, lib, pkgs, modulesPath, ... }:

{
  imports =
    [ (modulesPath + "/profiles/qemu-guest.nix")
    ];

  boot.initrd.availableKernelModules = [ "aes_x86_64" "ata_piix" "cryptd" "uhci_hcd" "virtio_pci" "sr_mod" "virtio_blk" ];
  boot.initrd.kernelModules = [ ];
  boot.kernelModules = [ ];
  boot.extraModulePackages = [ ];

  fileSystems."/" =
    { device = "rpool/root/nixos";
      fsType = "zfs";
    };

  fileSystems."/home" =
    { device = "rpool/home";
      fsType = "zfs";
    };

  fileSystems."/boot" =
    { device = "/dev/disk/by-uuid/294de4f1-72e2-4377-b565-b3d4eaaa37b6";
      fsType = "ext4";
    };

  swapDevices = [ ];

}

I disobey the warning at the top to add `"aes_x86_64"` and `"cryptd"` to the available kernel modules, to speed up encryption. The `configuration.nix` follows:

Click to view the configuration file

# Edit this configuration file to define what should be installed on
# your system.  Help is available in the configuration.nix(5) man page
# and in the NixOS manual (accessible by running ‘nixos-help’).

{ config, lib, pkgs, ... }:

{
  imports =
    [ # Include the results of the hardware scan.
      ./hardware-configuration.nix
    ];

  # Hardware stuff
  # add the following to hardware-configuration.nix - speeds up encryption
  #boot.initrd.availableKernelModules ++ [ "aes_x86_64" "cryptd" ];
  boot.initrd.luks.devices.crypt = {
    # Change this if moving to another machine!
    device = "/dev/disk/by-uuid/86090289-1c1f-4935-abce-a1aeee1b6125";
  };
  boot.kernelParams = [ "zfs.zfs_arc_max=536870912" ]; # sets zfs arc cache max target in bytes
  boot.supportedFilesystems = [ "zfs" ];
  nix.maxJobs = lib.mkDefault 6; # number of cpu cores

  # Use the GRUB 2 boot loader.
  boot.loader.grub.enable = true;
  boot.loader.grub.version = 2;
  # boot.loader.grub.efiSupport = true;
  # boot.loader.grub.efiInstallAsRemovable = true;
  # boot.loader.efi.efiSysMountPoint = "/boot/efi";
  # Define on which hard drive you want to install Grub.
  boot.loader.grub.device = "/dev/vda"; # or "nodev" for efi only
  boot.loader.grub.enableCryptodisk = true;
  boot.loader.grub.zfsSupport = true;

  networking.hostName = "m"; # Define your hostname.
  # networking.wireless.enable = true;  # Enables wireless support via wpa_supplicant.

  # The global useDHCP flag is deprecated, therefore explicitly set to false here.
  # Per-interface useDHCP will be mandatory in the future, so this generated config
  # replicates the default behaviour.
  networking.useDHCP = false;
  networking.interfaces.ens3.useDHCP = true;
  networking.hostId = "aoeu"; # set this to the first eight characters of /etc/machine-id for zfs
  networking.nat = {
    enable = true;
    externalInterface = "ens3"; # this may not be the interface name
    internalInterfaces = [ "wg0" ];
  };
  networking.firewall = {
    enable = true;
    allowedTCPPorts = [ 53 25565 ]; # open 53 for DNS and 25565 for Minecraft
    allowedUDPPorts = [ 53 51820 ]; # open 53 for DNS and 51820 for Wireguard - change the Wireguard port
  };
  networking.wg-quick.interfaces = {
    wg0 = {
      address = [ "10.0.0.1/24" "fdc9:281f:04d7:9ee9::1/64" ];
      listenPort = 51820;
      privateKeyFile = "/root/wireguard-keys/privatekey"; # fill this file with the server's private key and make it so only root has read/write access

      postUp = ''
        ${pkgs.iptables}/bin/iptables -A FORWARD -i wg0 -j ACCEPT
        ${pkgs.iptables}/bin/iptables -t nat -A POSTROUTING -s 10.0.0.1/24 -o ens3 -j MASQUERADE
        ${pkgs.iptables}/bin/ip6tables -A FORWARD -i wg0 -j ACCEPT
        ${pkgs.iptables}/bin/ip6tables -t nat -A POSTROUTING -s fdc9:281f:04d7:9ee9::1/64 -o ens3 -j MASQUERADE
      '';

      preDown = ''
        ${pkgs.iptables}/bin/iptables -D FORWARD -i wg0 -j ACCEPT
        ${pkgs.iptables}/bin/iptables -t nat -D POSTROUTING -s 10.0.0.1/24 -o ens3 -j MASQUERADE
        ${pkgs.iptables}/bin/ip6tables -D FORWARD -i wg0 -j ACCEPT
        ${pkgs.iptables}/bin/ip6tables -t nat -D POSTROUTING -s fdc9:281f:04d7:9ee9::1/64 -o ens3 -j MASQUERADE
      '';

      peers = [
        { # peer0
          publicKey = "{client public key}"; # replace this with the client's public key
          presharedKeyFile = "/root/wireguard-keys/preshared_from_peer0_key"; # fill this file with the preshared key and make it so only root has read/write access
          allowedIPs = [ "10.0.0.2/32" "fdc9:281f:04d7:9ee9::2/128" ];
        }
      ];
    };
  };

  # Configure network proxy if necessary
  # networking.proxy.default = "http://user:password@proxy:port/";
  # networking.proxy.noProxy = "127.0.0.1,localhost,internal.domain";

  nixpkgs.config = {
    allowUnfree = true; # don't set this if you want to ensure only free software
  };

  # Select internationalisation properties.
  i18n.defaultLocale = "en_US.UTF-8";
  console = {
    font = "Lat2-Terminus16";
    keyMap = "us";
  };

  # Set your time zone.
  time.timeZone = "America/New_York"; # set this to the same timezone your server is located in

  # List packages installed in system profile. To search, run:
  # $ nix search wget
  environment = {
    systemPackages = with pkgs; let
      nvimcust = neovim.override { # lazy minimal neovim config
        viAlias = true;
        vimAlias = true;
        withPython = true;
        configure = {
          packages.myPlugins = with pkgs.vimPlugins; {
            start = [ deoplete-nvim ];
            opt = [];
          };
          customRC = ''
            if filereadable($HOME . "/.config/nvim/init.vim")
              source ~/.config/nvim/init.vim
            endif

            set number

            set expandtab

            filetype plugin on
            syntax on

            let g:deoplete#enable_at_startup = 1
          '';
        };
      };
    in
    [
      jdk8
      nvimcust
      p7zip
      wget
      wireguard
    ];
  };

  # Some programs need SUID wrappers, can be configured further or are
  # started in user sessions.
  # programs.mtr.enable = true;
  # programs.gnupg.agent = {
  #   enable = true;
  #   enableSSHSupport = true;
  #   pinentryFlavor = "gnome3";
  # };

  # List services that you want to enable:

  # Enable the OpenSSH daemon.
  services = {
    dnsmasq = {
      enable = true;
      # this allows DNS requests from wg0 to be forwarded to the DNS server on this machine
      extraConfig = ''
        interface=wg0
      '';
    };
    fail2ban = {
      enable = true;
    };
    openssh = {
      enable = true;
      permitRootLogin = "no";
    };
    zfs = {
      autoScrub = {
        enable = true;
        interval = "monthly";
      };
    };
  };

  # Set sudo to request root password for all users
  # this should be changed for a multi-user server
  security.sudo.extraConfig = ''
    Defaults rootpw
  '';

  # Define a user account. Don't forget to set a password with ‘passwd’.
  users.users = {
    vpsadmin = { # admin account that has a password
      isNormalUser = true;
      home = "/home/vpsadmin";
      extraGroups = [ "wheel" ]; # Enable ‘sudo’ for the user.
      shell = pkgs.zsh;
    };
    mcserver = { # passwordless user to run a service - in this instance minecraft
      isNormalUser = true;
      home = "/home/mcserver";
      extraGroups = [];
      shell = pkgs.zsh;
    };
  };

  systemd = {
    services = {
      mcserverrun = { # this service runs a systemd sandboxed modded minecraft server as user mcserver
        enable = true;
        description = "Start and keep minecraft server running";
        wants = [ "network.target" ];
        after = [ "network.target" ];
        serviceConfig = {
          User = "mcserver";
          NoNewPrivileges = true;
          PrivateTmp = true;
          ProtectSystem = "strict";
          PrivateDevices = true;
          ReadWritePaths = "/home/mcserver/Eternal_current";
          WorkingDirectory = "/home/mcserver/Eternal_current";
          ExecStart = "${pkgs.jdk8}/bin/java -Xms11520M -Xmx11520M -server -XX:+AggressiveOpts -XX:ParallelGCThreads=3 -XX:+UseConcMarkSweepGC -XX:+UnlockExperimentalVMOptions -XX:+UseParNewGC -XX:+ExplicitGCInvokesConcurrent -XX:MaxGCPauseMillis=10 -XX:GCPauseIntervalMillis=50 -XX:+UseFastAccessorMethods -XX:+OptimizeStringConcat -XX:NewSize=84m -XX:+UseAdaptiveGCBoundary -XX:NewRatio=3 -jar forge-1.12.2-14.23.5.2847-universal.jar nogui";
          Restart = "always";
          RestartSec = 12;
        };
        wantedBy = [ "multi-user.target" ];
      };
      mcserverscheduledrestart = { # this service restarts the minecraft server on a schedule
        enable = true;
        description = "restart mcserverrun service";
        serviceConfig = {
          Type = "oneshot";
          ExecStart = "${pkgs.systemd}/bin/systemctl try-restart mcserverrun.service";
        };
      };
    };
    timers = {
      mcserverscheduledrestart = { # this timer triggers the service of the same name
        enable = true;
        description = "restart mcserverrun service daily";
        timerConfig = {
          OnCalendar = "*-*-* 6:00:00";
        };
        wantedBy = [ "timers.target" ];
      };
    };
  };

  # This value determines the NixOS release from which the default
  # settings for stateful data, like file locations and database versions
  # on your system were taken. It‘s perfectly fine and recommended to leave
  # this value at the release version of the first install of this system.
  # Before changing this value read the documentation for this option
  # (e.g. man configuration.nix or on https://nixos.org/nixos/options.html).
  system.stateVersion = "20.09"; # Did you read the comment?

}

You'll notice that this server acts as a Wireguard endpoint and as a Minecraft server. I described the first part on the [NixOS wiki page for Wireguard](https://nixos.wiki/wiki/Wireguard) under the section that mentions dnsmasq. The second part is done using NixOS's systemd support, which can be a bit confusing at first but is easy enough once you know how it works.

Edit: Also, the provider I use is ExtraVM, who has been excellent.

6 votes

Tips to use NixOS on a server?
- linux
Ask (advice)
I see some people using NixOs on their servers. I would like to try it out to self host some services and learn about NixOs. I use hetzner and they have an NixOs iso available so I can just use...

I see some people using NixOs on their servers. I would like to try it out to self host some services and learn about NixOs.

I use hetzner and they have an NixOs iso available so I can just use that to install NixOs. But how do people manage remote instances of NixOs? They would just use ansible or something like it, to run nix on the host, or is there a better way?

Thanks

5 comments

simao

January 3, 2021

11 votes
In which a foolish developer tries DevOps: critique my VPS provisioning script!
- security
Ask (advice)
I'm attempting to provision two mirror staging and production environments for a future SaaS application that we're close to launching as a company, and I'd like to get some feedback on the...

I'm attempting to provision two mirror staging and production environments for a future SaaS application that we're close to launching as a company, and I'd like to get some feedback on the provisioning script I've created that takes a default VPS from our hosting provider, DigitalOcean, and readies it for being a secure hosting environment for our application instance (which runs inside Docker, and persists data to an unrelated managed database).

I'm sticking with a simple infrastructure architecture at the moment: A single VPS which runs both nginx and the application instance inside a containerised docker service as mentioned earlier. There's no load balancers or server duplication at this point. @Emerald_Knight very kindly provided me in the Tildes Discord with some overall guidance about what to aim for when configuring a server (limit damage as best as possible, limit access when an attack occurs)—so I've tried to be thoughtful and integrate that paradigm where possible (disabling root login, etc).

I’m not a DevOps or sysadmin-oriented person by trade—I stick to programming most of the time—but this role falls to me as the technical person in this business; so the last few days has been a lot of reading and readying. I’ll run through the provisioning flow step by step. Oh, and for reference, Ubuntu 20.04 LTS.

First step is self-explanatory.
```
#!/bin/sh

# Name of the user to create and grant privileges to.
USERNAME_OF_ACCOUNT=

sudo apt-get -qq update
sudo apt install -qq --yes nginx
sudo systemctl restart nginx
```
Next, create my sudo user, add them to the groups needed, require a password change on first login, then copy across any provided authorised keys from the root user which you can configure to be seeded to the VPS in the DigitalOcean management console.
```
useradd --create-home --shell "/bin/bash" --groups sudo,www-data "${USERNAME_OF_ACCOUNT}"
passwd --delete $USERNAME_OF_ACCOUNT
chage --lastday 0 $USERNAME_OF_ACCOUNT

HOME_DIR="$(eval echo ~${USERNAME_OF_ACCOUNT})"
mkdir --parents "${HOME_DIR}/.ssh"
cp /root/.ssh/authorized_keys "${HOME_DIR}/.ssh"

chmod 700 ~/.ssh
chmod 600 ~/.ssh/authorized_keys
chown --recursive "${USERNAME_OF_ACCOUNT}":"${USERNAME_OF_ACCOUNT}" "${HOME_DIR}/.ssh"  sudo chmod 775 -R /var/www
sudo chown -R $USERNAME_OF_ACCOUNT /var/www
rm -rf /var/www/html
```
Installation of docker, and run it as a service, ensure the created user is added to the docker group.
```
sudo apt-get install -qq --yes \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg-agent \
    software-properties-common

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo apt-key fingerprint 0EBFCD88

sudo add-apt-repository --yes \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"

sudo apt-get -qq update
sudo apt install -qq --yes docker-ce docker-ce-cli containerd.io

# Only add a group if it does not exist
sudo getent group docker || sudo groupadd docker
sudo usermod -aG docker $USERNAME_OF_ACCOUNT

# Enable docker
sudo systemctl enable docker

sudo curl -L "https://github.com/docker/compose/releases/download/1.27.4/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
sudo ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose
docker-compose --version
```
Disable root logins and any form of password-based authentication by altering sshd_config.
```
sed -i '/^PermitRootLogin/s/yes/no/' /etc/ssh/sshd_config
sed -i '/^PasswordAuthentication/s/yes/no/' /etc/ssh/sshd_config
sed -i '/^ChallengeResponseAuthentication/s/yes/no/' /etc/ssh/sshd_config
```
Configure the firewall and fail2ban.
```
sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow ssh
sudo ufw allow http
sudo ufw allow https
sudo ufw reload
sudo ufw --force enable && sudo ufw status verbose

sudo apt-get -qq install --yes fail2ban
sudo systemctl enable fail2ban
sudo systemctl start fail2ban
```
Swapfiles.
```
sudo fallocate -l 1G /swapfile && ls -lh /swapfile
sudo chmod 0600 /swapfile && ls -lh /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile && sudo swapon --show
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
```
Unattended updates, and restart the ssh daemon.
```
sudo apt install -qq unattended-upgrades
sudo systemctl restart ssh
```
Some questions

You can assume these questions are cost-benefit focused, i.e. is it worth my time to investigate this, versus something else that may have better gains given my limited time.
1. Obviously, any critiques of the above provisioning process are appreciated—both on the micro level of criticising particular lines, or zooming out and saying “well why don’t you do this instead…”. I can’t know what I don’t know. 
2. Is it worth investigating tools such as ss or lynis (https://github.com/CISOfy/lynis) to perform server auditing? I don’t have to meet any compliance requirements at this point. 
3. Do I get any meaningful increase in security by implementing 2FA on login here using google authenticator? As far as I can see, as long as I'm using best practices to actually ssh into our boxes, then the likeliest risk profile for unwanted access probably isn’t via the authentication mechanism I use personally to access my servers. 
4. Am I missing anything here? Beyond the provisioning script itself, I adhere to best practices around storing and generating passwords and ssh keys.
Some notes and comments
1. Eventually I'll use the hosting provider's API to spin up and spin down VPS's on the fly via a custom management application, which gives me an opportunity to programmatically execute the provisioning script above and run some over pre- and post-provisioning things, like deployment of the application and so forth. 
2. Usage alerts and monitoring is configured within DigitalOcean's console, and alerts are sent to our business' Slack for me to action as needed. Currently, I’m settling on the following alerts:
  1. Server CPU utilisation greater than 80% for 5 minutes.
  2. Server memory usage greater than 80% for 5 minutes.
  3. I’m also looking at setting up daily fail2ban status alerts if needed.
5 comments

unknown user

October 27, 2020

9 votes
What AWS services should have been called

Link

1 comment

expeditedsecurity.com

October 22, 2020

8 votes
Zero-downtime Kubernetes deployments

Article 1161 words

0 comments

sbg.technology

August 21, 2020

4 votes
7 Aspects of IT Certifications

Article

1 comment

Medium

July 28, 2020

1 vote
Coding and Tracing Workflow Remix (feat. Dark)
- programming
Article 927 words
1 comment

Substack: Clay Smith

March 4, 2020

3 votes
Chaos Engineering, Complexity, and Microservice Catalogs

Article 739 words

1 comment

Substack: Clay Smith

February 5, 2020

3 votes
Successfully Merging the Work of 1000+ Developers
- programming
Article 1895 words
1 comment

shopify.com

November 20, 2019

7 votes
If you’re not using SSH certificates you’re doing SSH wrong
- security
Article 3346 words
2 comments

smallstep.com

September 13, 2019

9 votes
Deploying containerized Docker instances in production?

Ask (advice)

Hello! After spending many development hours in my past years running on Virtualbox/Vagrant-style setups, I've decided to take the plunge into learning Docker, and after getting a few containers...

Hello! After spending many development hours in my past years running on Virtualbox/Vagrant-style setups, I've decided to take the plunge into learning Docker, and after getting a few containers working, I'm now looking to figure out how to deploy this to production. I'm not a DevOps or infrastructure guy, my bread and butter is software, and although I've become significantly better at deploying & provisioning Linux VPS's, I'm still not entirely confident in my ability to deploy & manage such systems at scale and in production. But, I am now close to running my own business, so these requirements are suddenly going from "nice to have" to "critical".

As I mentioned, in the past when I've previously developed applications that have been pushed onto the web, I've tended to develop on my local machine, often with no specific configuration environment. If I did use an environment, it'd often be a Vagrant VM instance. From here, I'd push to GitHub, then from my VPS, pull down the changes, run any deployment scripts (recompile, restart nginx, etc), and I'm done.

I guess what I'm after with Docker is something that's more consistent between dev, testing, & prod, and is also more hands off in the deployment process. Yet, what I'm currently developing still does have differing configuration needs between dev and prod. For example, I'd like to use a hosted DB solution such as DigitalOcean Managed Databases in production, yet I'm totally fine using a Docker container for MySQL for local development. Is something like this possible? Does anyone have any recommendations around how to accomplish this, any do's and dont's, or any catches that are worth mentioning?

How about automating deployment from GitHub to production? I've never touched any CI/CD tools in my life, yet I know it's a hugely important part of the process when dealing with software in production, especially software that has clients dependent on it to function. Does anything specifically work well with Docker? Or GitHub? Ideally I want to be avoiding manual processes where I have to ssh in, and pull down the latest changes, half-remembering the commands I need to write to recompile and run the application again.

9 comments

unknown user

July 9, 2019

10 votes
Announcing workers.dev

Article 582 words

5 comments

cloudflare.com

February 20, 2019

9 votes
SRE mastery: Designing and developing for uptime

Article 1682 words

0 comments

Hewlett Packard Enterprise

November 13, 2018

4 votes
Let's talk best-practice Jenkins on AWS ECS

Text 704 words
[seen on reddit but no discussion - if it's not okay to seek out better discussion here after seeing something fall flat on reddit, I am very sorry and I'll delete promptly] I've had some...

[seen on reddit but no discussion - if it's not okay to seek out better discussion here after seeing something fall flat on reddit, I am very sorry and I'll delete promptly]

I've had some experience in this realm for a while now, but I'm having a little trouble with one issue in particular. Before I divulge, I'll present my thoughts on best practice and and what I've been able to implement:
- Terraform everything (in accordance to terragrunt's "style guide" i.e. organization)
  THIS IS A BIG ONE: for the jenkins master task, make sure to use the following args to make sure jenkins jobs aren't super slow as hell to start:
```
-Djava.awt.headless=true -Dhudson.slaves.NodeProvisioner.initialDelay=0 -Dhudson.slaves.NodeProvisioner.MARGIN=50 -Dhudson.slaves.NodeProvisioner.MARGIN0=0.85
```
THIS IS A GAME CHANGER (more-so on k8s clusters when the ecs plugin isn't used... hint, it's shit).
- Create an EFS (in a separate terraform module) and mount it to the jenkins ECS cluster at /var/jenkins_home. Makes jenkins much more reliable through outages and easier to upgrade.
- Run a logging agent (via docker container) like logspout or newrelic or whatever IN USER_DATA and not as a task - that way you get logs if there are issues during user_data/cloud_init... this I'm actually not sure about. Running a container outside the context of an ECS task means the ECS agent can't really track it and allocate mem/cpu properly... but it does help with user_data triage.
- Use pipelines and git plugins to drive jobs. All jenkins jobs should be in source control!
- Make sure you setup docker cleanup jobs on DAY 1! If you hace limited access to your cluster and you run out of disk due to docker cache, networks, volumes, etc... you're screwed till the admin ssh's in and runs a prune. Get a docker system prune going or the equivalent for each docker resource with appropriate filters... i.e. filter for anything older than a few days and is dangling.
- Use Jenkins Global Libraries to make Jenkinsfiles cleaner (I always just use vars instead of groovy/java style packages because it's easier and less ugly)
  Jenkinsfiles should mostly call other bash files, make files, python scripts to generate and load prop files, etc. The less logic you put in a Jenkinsfile (which is just modified groovy) the better. String interpolation, among other things, is a fuckery that we don't have time to triage.
- (out-of-scope) Move to using k8s/EKS instead of ECS asap because the ECS plugin for jenkins is absolute shit and it doesn't use priority correctly (sorry whoever developed it and... oh wait abandoned it and hasn't merged anything for years... for for real it's cool, just give admin to someone else).
- (cultural) Stop calling them slaves. "Hey @eng, we're rotating slaves due to some cache issues. If you have been affected by race conditions in that past, our new update and slave rotation should fix that. Our update may have killed your job that was running on an old slave, just wait a few and the new slaves will be ready" <--This just doesn't look good.
  Hope that was some good stuff for you guys. Maybe I'm preaching to the choir, but I've seen some pretty shit jenkins setups.
NOW FOR MY QUESTION!

Has ANYONE actually been able to setup a proper jenkins user on ECS that actually works for both a master and ephemeral jenkins-agents so that they can mount and use the docker.sock for builds without hitting permission issues? I'm talking using the ecs plugin and mounting docker.sock via that.

I have always resorted to running jenkins master and agents as root, which means you have to chmod files (super expensive time and cpu for services with tons of files). Running microservices as root is obviously bad practice, and chmod-ing a zilliion files is shit for docker cache and time... so I want to get jenkins users able to utilize the docker.sock. THIS IS SPECIFICALLY FOR THE AWS ECS AMI! I don't care about debian or old versions of docker where you could use DOCKER_OPTS. That doesn't work on the AWS Linux image.

Thanks! And happy Friday!
0 comments

dodger

October 27, 2018

5 votes
Consul Connect announcement: simple authorization + encryption mesh

Article 1646 words

1 comment

hashicorp.com

August 2, 2018

4 votes
Be nice to your DevOps team by not using Maven or kitchen sink frameworks.

Text 171 words

Maven tries to be the kitchen sink in a lot of ways - rigid requirements to use plugins instead of scripts, trying to wrap your scm, and even act as a docker wrapper... this is insanely...

Maven tries to be the kitchen sink in a lot of ways - rigid requirements to use plugins instead of scripts, trying to wrap your scm, and even act as a docker wrapper... this is insanely frustrating and an anti-patter for the rest of the software space. I would rather find a new job than work at a company that keeps pumping out maven and jhipster apps. It doesn't play nice with CI, it uses an insanely ugly configuration (xml) and most java developers don't even really know what they are doing when they are using it.

Making a micro-service api? You don't need jhipster or maven or even java - there are so many other better alternatives. Need something simple? flask. Need something performant? go. And there are so many others in between that won't give you a NullPointerException, require you to download the entire internet just to serve some serialized json, or make your devops team hate you.

Interested in hearing rebuttals and other peoples alts and overall preferences.

5 comments

dodger

June 6, 2018

5 votes
Speeding up Zsh and Oh-My-Zsh

Article 1192 words, published May 16 2018

3 comments

jonlu.ca

June 6, 2018

7 votes

Some questions

Some notes and comments