• Activity
  • Votes
  • Comments
  • New
  • All activity
  • Showing only topics in ~comp with the tag "devops". Back to normal view / Search all groups
    1. A few easy linux commands, and a real-world example on how to use them in a pinch

      This below is a summary of some real-world performance investigation I recently went through. The tools I used are installed on all linux systems, but I know some people don't know them and would...

      This below is a summary of some real-world performance investigation I recently went through. The tools I used are installed on all linux systems, but I know some people don't know them and would straight up jump to heavyweight log analysis services and what not, or writing their own solution.

      Let's say you have request log sampling in a bunch of log files that contain lines like these:

      127.0.0.1 [2021-05-27 23:28:34.460] "GET /static/images/flags/2/54@3x.webp HTTP/2" 200 1806 TLSv1.3 HIT-CLUSTER SessionID:(null) Cache:max-age=31536000
      127.0.0.1 [2021-05-27 23:51:22.019] "GET /pl/player/123456/changelog/ HTTP/1.1" 200 16524 TLSv1.2 MISS-CLUSTER SessionID:(null) Cache:

      You might recognize Fastly logs there (IP anonymized). Now, there's a lot you might care about in this log file, but in my case, I wanted to get a breakdown of hits vs misses by URL.

      So, first step, let's concatenate all the log files with cat *.log > all.txt, so we can work off a single file.

      Then, let's split the file in two: hits and misses. There are a few different values for them, the majority are covered by either HIT-CLUSTER or MISS-CLUSTER. We can do this by just grepping for them like so:

      grep HIT-CLUSTER all.txt > hits.txt; grep MISS-CLUSTER all.txt > misses.txt
      

      However, we only care about url and whether it's a hit or a miss. So let's clean up those hits and misses with cut. The way cut works, it takes a delimiter (-d) and cuts the input based on that; you then give it a range of "fields" (-f) that you want.

      In our case, if we cut based on spaces, we end up with for example: 127.0.0.1 [2021-05-27 23:28:34.460] "GET /static/images/flags/2/54@3x.webp HTTP/2" 200 1806 TLSv1.3 HIT-CLUSTER SessionID:(null) Cache:max-age=31536000.

      We care about the 5th value only. So let's do: cut -d" " -f5 to get that. We will also sort the result, because future operations will require us to work on a sorted list of values.

      cut -d" " -f5 hits.txt | sort > hits-sorted.txt; cut -d" " -f5 misses.txt | sort > misses-sorted.txt
      

      Now we can start doing some neat stuff. wc (wordcount) is an awesome utility, it lets you count characters, words or lines very easily. wc -l counts lines in an input, since we're operating with one value per line we can easily count our hits and misses already:

      $ wc -l hits-sorted.txt misses-sorted.txt
        132523 hits-sorted.txt
        220779 misses-sorted.txt
        353302 total
      

      220779 / 132523 is a 1:1.66 ratio of hits to misses. That's not great…

      Alright, now I'm also interested in how many unique URLs are hit versus missed. uniq tool deduplicates immediate sequences, so the input has to be sorted in order to deduplicate our entire file. We already did that. We can now count our urls with uniq < hits-sorted.txt | wc -l; uniq < misses-sorted.txt | wc -l. We get 49778 and 201178, respectively. It's to be expected that most of our cache misses would be in "rarer" urls; this gives us a 1:4 ratio of cached to uncached URL.

      Let's say we want to dig down further into which URLs are most often hitting the cache, specifically. We can add -c to uniq in order to get a duplicate count in front of our URLs. To get the top ones at the top, we can then use sort, in reverse sort mode (-r), and it also needs to be numeric sort, not alphabetic (-n). head lets us get the top 10.

      $ uniq -c < hits-sorted.txt | sort -nr | head
          815 /static/app/webfonts/fa-solid-900.woff2?d720146f1999
          793 /static/app/images/1.png
          786 /static/app/fonts/nunito-v9-latin-ext_latin-regular.woff2?d720146f1999
          760 /static/CACHE/js/output.cee5c4089626.js
          758 /static/images/crest/3/light/notfound.png
          757 /static/CACHE/css/output.4f2b59394c83.css
          756 /static/app/webfonts/fa-regular-400.woff2?d720146f1999
          754 /static/app/css/images/loading.gif?d720146f1999
          750 /static/app/css/images/prev.png?d720146f1999
          745 /static/app/css/images/next.png?d720146f1999
      

      And same for misses:

      $ uniq -c < misses-sorted.txt | sort -nr | head
           56 /
           14 /player/237678/
           13 /players/
           12 /teams/
           11 /players/top/
      <snip>
      

      So far this tells us static files are most often hit, and for misses it also tells us… something, but we can't quite track it down yet (and we won't, not in this post). We're not adjusting for how often the page is hit as a whole, this is still just high-level analysis.

      One last thing I want to show you! Let's take everything we learned and analyze those URLs by prefix instead. We can cut our URLs again by slash with cut -d"/". If we want the first prefix, we can do -f1-2, or -f1-3 for the first two prefixes. Let's look!

      cut -d'/' -f1-2 < hits-sorted.txt | uniq -c | sort -nr | head
       100189 /static
         5948 /es
         3069 /player
         2480 /fr
         2476 /es-mx
         2295 /pt-br
         2094 /tr
         1939 /it
         1692 /ru
         1626 /de
      
      cut -d'/' -f1-2 < misses-sorted.txt | uniq -c | sort -nr | head
        66132 /static
        18578 /es
        17448 /player
        17064 /tr
        11379 /fr
         9624 /pt-br
         8730 /es-mx
         7993 /ru
         7689 /zh-hant
         7441 /it
      

      This gives us hit-miss ratios by prefix. Neat, huh?

      13 votes
    2. NixOS Configuration for a VPS

      Since I took so long to reply to Tips to use NixOS on a server? by @simao, I decided to create a new topic to share my configs. Hopefully this is informative for anyone looking to do similar...

      Since I took so long to reply to Tips to use NixOS on a server? by @simao, I decided to create a new topic to share my configs. Hopefully this is informative for anyone looking to do similar things - I'll also gladly take critiques, since my setup is probably not perfect.

      First, I will share the output of 'lsblk' on my VPS:

      NAME      MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
      vda       253:0    0   180G  0 disk  
      ├─vda1    253:1    0   512M  0 part  /boot
      └─vda2    253:2    0 179.5G  0 part  
        └─crypt 254:0    0 179.5G  0 crypt 
      

      That is, I use an unencrypted /boot partition, vda1, with GRUB 2 to prompt for a passphrase during boot, to unlock the LUKS encrypted vda2. I prefer to use ZFS as my file system for the encrypted drive, and LUKS rather than ZFS encryption. This is an MBR drive, since that's what my VPS provider uses, though UEFI would look the same. The particular way I do this also requires access through the provider's tools, and not ssh or similar. The hardware-configuration.nix file reflects this:

      Click to view the hardware configuration file
      # Do not modify this file!  It was generated by ‘nixos-generate-config’
      # and may be overwritten by future invocations.  Please make changes
      # to /etc/nixos/configuration.nix instead.
      { config, lib, pkgs, modulesPath, ... }:
      
      {
        imports =
          [ (modulesPath + "/profiles/qemu-guest.nix")
          ];
      
        boot.initrd.availableKernelModules = [ "aes_x86_64" "ata_piix" "cryptd" "uhci_hcd" "virtio_pci" "sr_mod" "virtio_blk" ];
        boot.initrd.kernelModules = [ ];
        boot.kernelModules = [ ];
        boot.extraModulePackages = [ ];
      
        fileSystems."/" =
          { device = "rpool/root/nixos";
            fsType = "zfs";
          };
      
        fileSystems."/home" =
          { device = "rpool/home";
            fsType = "zfs";
          };
      
        fileSystems."/boot" =
          { device = "/dev/disk/by-uuid/294de4f1-72e2-4377-b565-b3d4eaaa37b6";
            fsType = "ext4";
          };
      
        swapDevices = [ ];
      
      }
      
      I disobey the warning at the top to add `"aes_x86_64"` and `"cryptd"` to the available kernel modules, to speed up encryption. The `configuration.nix` follows:
      Click to view the configuration file
      # Edit this configuration file to define what should be installed on
      # your system.  Help is available in the configuration.nix(5) man page
      # and in the NixOS manual (accessible by running ‘nixos-help’).
      
      { config, lib, pkgs, ... }:
      
      {
        imports =
          [ # Include the results of the hardware scan.
            ./hardware-configuration.nix
          ];
      
        # Hardware stuff
        # add the following to hardware-configuration.nix - speeds up encryption
        #boot.initrd.availableKernelModules ++ [ "aes_x86_64" "cryptd" ];
        boot.initrd.luks.devices.crypt = {
          # Change this if moving to another machine!
          device = "/dev/disk/by-uuid/86090289-1c1f-4935-abce-a1aeee1b6125";
        };
        boot.kernelParams = [ "zfs.zfs_arc_max=536870912" ]; # sets zfs arc cache max target in bytes
        boot.supportedFilesystems = [ "zfs" ];
        nix.maxJobs = lib.mkDefault 6; # number of cpu cores
      
        # Use the GRUB 2 boot loader.
        boot.loader.grub.enable = true;
        boot.loader.grub.version = 2;
        # boot.loader.grub.efiSupport = true;
        # boot.loader.grub.efiInstallAsRemovable = true;
        # boot.loader.efi.efiSysMountPoint = "/boot/efi";
        # Define on which hard drive you want to install Grub.
        boot.loader.grub.device = "/dev/vda"; # or "nodev" for efi only
        boot.loader.grub.enableCryptodisk = true;
        boot.loader.grub.zfsSupport = true;
      
        networking.hostName = "m"; # Define your hostname.
        # networking.wireless.enable = true;  # Enables wireless support via wpa_supplicant.
      
        # The global useDHCP flag is deprecated, therefore explicitly set to false here.
        # Per-interface useDHCP will be mandatory in the future, so this generated config
        # replicates the default behaviour.
        networking.useDHCP = false;
        networking.interfaces.ens3.useDHCP = true;
        networking.hostId = "aoeu"; # set this to the first eight characters of /etc/machine-id for zfs
        networking.nat = {
          enable = true;
          externalInterface = "ens3"; # this may not be the interface name
          internalInterfaces = [ "wg0" ];
        };
        networking.firewall = {
          enable = true;
          allowedTCPPorts = [ 53 25565 ]; # open 53 for DNS and 25565 for Minecraft
          allowedUDPPorts = [ 53 51820 ]; # open 53 for DNS and 51820 for Wireguard - change the Wireguard port
        };
        networking.wg-quick.interfaces = {
          wg0 = {
            address = [ "10.0.0.1/24" "fdc9:281f:04d7:9ee9::1/64" ];
            listenPort = 51820;
            privateKeyFile = "/root/wireguard-keys/privatekey"; # fill this file with the server's private key and make it so only root has read/write access
      
            postUp = ''
              ${pkgs.iptables}/bin/iptables -A FORWARD -i wg0 -j ACCEPT
              ${pkgs.iptables}/bin/iptables -t nat -A POSTROUTING -s 10.0.0.1/24 -o ens3 -j MASQUERADE
              ${pkgs.iptables}/bin/ip6tables -A FORWARD -i wg0 -j ACCEPT
              ${pkgs.iptables}/bin/ip6tables -t nat -A POSTROUTING -s fdc9:281f:04d7:9ee9::1/64 -o ens3 -j MASQUERADE
            '';
      
            preDown = ''
              ${pkgs.iptables}/bin/iptables -D FORWARD -i wg0 -j ACCEPT
              ${pkgs.iptables}/bin/iptables -t nat -D POSTROUTING -s 10.0.0.1/24 -o ens3 -j MASQUERADE
              ${pkgs.iptables}/bin/ip6tables -D FORWARD -i wg0 -j ACCEPT
              ${pkgs.iptables}/bin/ip6tables -t nat -D POSTROUTING -s fdc9:281f:04d7:9ee9::1/64 -o ens3 -j MASQUERADE
            '';
      
            peers = [
              { # peer0
                publicKey = "{client public key}"; # replace this with the client's public key
                presharedKeyFile = "/root/wireguard-keys/preshared_from_peer0_key"; # fill this file with the preshared key and make it so only root has read/write access
                allowedIPs = [ "10.0.0.2/32" "fdc9:281f:04d7:9ee9::2/128" ];
              }
            ];
          };
        };
      
        # Configure network proxy if necessary
        # networking.proxy.default = "http://user:password@proxy:port/";
        # networking.proxy.noProxy = "127.0.0.1,localhost,internal.domain";
      
        nixpkgs.config = {
          allowUnfree = true; # don't set this if you want to ensure only free software
        };
      
        # Select internationalisation properties.
        i18n.defaultLocale = "en_US.UTF-8";
        console = {
          font = "Lat2-Terminus16";
          keyMap = "us";
        };
      
        # Set your time zone.
        time.timeZone = "America/New_York"; # set this to the same timezone your server is located in
      
        # List packages installed in system profile. To search, run:
        # $ nix search wget
        environment = {
          systemPackages = with pkgs; let
            nvimcust = neovim.override { # lazy minimal neovim config
              viAlias = true;
              vimAlias = true;
              withPython = true;
              configure = {
                packages.myPlugins = with pkgs.vimPlugins; {
                  start = [ deoplete-nvim ];
                  opt = [];
                };
                customRC = ''
                  if filereadable($HOME . "/.config/nvim/init.vim")
                    source ~/.config/nvim/init.vim
                  endif
      
                  set number
      
                  set expandtab
      
                  filetype plugin on
                  syntax on
      
                  let g:deoplete#enable_at_startup = 1
                '';
              };
            };
          in
          [
            jdk8
            nvimcust
            p7zip
            wget
            wireguard
          ];
        };
      
        # Some programs need SUID wrappers, can be configured further or are
        # started in user sessions.
        # programs.mtr.enable = true;
        # programs.gnupg.agent = {
        #   enable = true;
        #   enableSSHSupport = true;
        #   pinentryFlavor = "gnome3";
        # };
      
        # List services that you want to enable:
      
        # Enable the OpenSSH daemon.
        services = {
          dnsmasq = {
            enable = true;
            # this allows DNS requests from wg0 to be forwarded to the DNS server on this machine
            extraConfig = ''
              interface=wg0
            '';
          };
          fail2ban = {
            enable = true;
          };
          openssh = {
            enable = true;
            permitRootLogin = "no";
          };
          zfs = {
            autoScrub = {
              enable = true;
              interval = "monthly";
            };
          };
        };
      
        # Set sudo to request root password for all users
        # this should be changed for a multi-user server
        security.sudo.extraConfig = ''
          Defaults rootpw
        '';
      
        # Define a user account. Don't forget to set a password with ‘passwd’.
        users.users = {
          vpsadmin = { # admin account that has a password
            isNormalUser = true;
            home = "/home/vpsadmin";
            extraGroups = [ "wheel" ]; # Enable ‘sudo’ for the user.
            shell = pkgs.zsh;
          };
          mcserver = { # passwordless user to run a service - in this instance minecraft
            isNormalUser = true;
            home = "/home/mcserver";
            extraGroups = [];
            shell = pkgs.zsh;
          };
        };
      
        systemd = {
          services = {
            mcserverrun = { # this service runs a systemd sandboxed modded minecraft server as user mcserver
              enable = true;
              description = "Start and keep minecraft server running";
              wants = [ "network.target" ];
              after = [ "network.target" ];
              serviceConfig = {
                User = "mcserver";
                NoNewPrivileges = true;
                PrivateTmp = true;
                ProtectSystem = "strict";
                PrivateDevices = true;
                ReadWritePaths = "/home/mcserver/Eternal_current";
                WorkingDirectory = "/home/mcserver/Eternal_current";
                ExecStart = "${pkgs.jdk8}/bin/java -Xms11520M -Xmx11520M -server -XX:+AggressiveOpts -XX:ParallelGCThreads=3 -XX:+UseConcMarkSweepGC -XX:+UnlockExperimentalVMOptions -XX:+UseParNewGC -XX:+ExplicitGCInvokesConcurrent -XX:MaxGCPauseMillis=10 -XX:GCPauseIntervalMillis=50 -XX:+UseFastAccessorMethods -XX:+OptimizeStringConcat -XX:NewSize=84m -XX:+UseAdaptiveGCBoundary -XX:NewRatio=3 -jar forge-1.12.2-14.23.5.2847-universal.jar nogui";
                Restart = "always";
                RestartSec = 12;
              };
              wantedBy = [ "multi-user.target" ];
            };
            mcserverscheduledrestart = { # this service restarts the minecraft server on a schedule
              enable = true;
              description = "restart mcserverrun service";
              serviceConfig = {
                Type = "oneshot";
                ExecStart = "${pkgs.systemd}/bin/systemctl try-restart mcserverrun.service";
              };
            };
          };
          timers = {
            mcserverscheduledrestart = { # this timer triggers the service of the same name
              enable = true;
              description = "restart mcserverrun service daily";
              timerConfig = {
                OnCalendar = "*-*-* 6:00:00";
              };
              wantedBy = [ "timers.target" ];
            };
          };
        };
      
        # This value determines the NixOS release from which the default
        # settings for stateful data, like file locations and database versions
        # on your system were taken. It‘s perfectly fine and recommended to leave
        # this value at the release version of the first install of this system.
        # Before changing this value read the documentation for this option
        # (e.g. man configuration.nix or on https://nixos.org/nixos/options.html).
        system.stateVersion = "20.09"; # Did you read the comment?
      
      }
      
      You'll notice that this server acts as a Wireguard endpoint and as a Minecraft server. I described the first part on the [NixOS wiki page for Wireguard](https://nixos.wiki/wiki/Wireguard) under the section that mentions dnsmasq. The second part is done using NixOS's systemd support, which can be a bit confusing at first but is easy enough once you know how it works.

      Edit: Also, the provider I use is ExtraVM, who has been excellent.

      6 votes
    3. Tips to use NixOS on a server?

      I see some people using NixOs on their servers. I would like to try it out to self host some services and learn about NixOs. I use hetzner and they have an NixOs iso available so I can just use...

      I see some people using NixOs on their servers. I would like to try it out to self host some services and learn about NixOs.

      I use hetzner and they have an NixOs iso available so I can just use that to install NixOs. But how do people manage remote instances of NixOs? They would just use ansible or something like it, to run nix on the host, or is there a better way?

      Thanks

      11 votes
    4. In which a foolish developer tries DevOps: critique my VPS provisioning script!

      I'm attempting to provision two mirror staging and production environments for a future SaaS application that we're close to launching as a company, and I'd like to get some feedback on the...

      I'm attempting to provision two mirror staging and production environments for a future SaaS application that we're close to launching as a company, and I'd like to get some feedback on the provisioning script I've created that takes a default VPS from our hosting provider, DigitalOcean, and readies it for being a secure hosting environment for our application instance (which runs inside Docker, and persists data to an unrelated managed database).

      I'm sticking with a simple infrastructure architecture at the moment: A single VPS which runs both nginx and the application instance inside a containerised docker service as mentioned earlier. There's no load balancers or server duplication at this point. @Emerald_Knight very kindly provided me in the Tildes Discord with some overall guidance about what to aim for when configuring a server (limit damage as best as possible, limit access when an attack occurs)—so I've tried to be thoughtful and integrate that paradigm where possible (disabling root login, etc).

      I’m not a DevOps or sysadmin-oriented person by trade—I stick to programming most of the time—but this role falls to me as the technical person in this business; so the last few days has been a lot of reading and readying. I’ll run through the provisioning flow step by step. Oh, and for reference, Ubuntu 20.04 LTS.

      First step is self-explanatory.

      #!/bin/sh
      
      # Name of the user to create and grant privileges to.
      USERNAME_OF_ACCOUNT=
      
      sudo apt-get -qq update
      sudo apt install -qq --yes nginx
      sudo systemctl restart nginx
      

      Next, create my sudo user, add them to the groups needed, require a password change on first login, then copy across any provided authorised keys from the root user which you can configure to be seeded to the VPS in the DigitalOcean management console.

      useradd --create-home --shell "/bin/bash" --groups sudo,www-data "${USERNAME_OF_ACCOUNT}"
      passwd --delete $USERNAME_OF_ACCOUNT
      chage --lastday 0 $USERNAME_OF_ACCOUNT
      
      HOME_DIR="$(eval echo ~${USERNAME_OF_ACCOUNT})"
      mkdir --parents "${HOME_DIR}/.ssh"
      cp /root/.ssh/authorized_keys "${HOME_DIR}/.ssh"
      
      chmod 700 ~/.ssh
      chmod 600 ~/.ssh/authorized_keys
      chown --recursive "${USERNAME_OF_ACCOUNT}":"${USERNAME_OF_ACCOUNT}" "${HOME_DIR}/.ssh"

sudo chmod 775 -R /var/www
      sudo chown -R $USERNAME_OF_ACCOUNT /var/www
      rm -rf /var/www/html
      

      Installation of docker, and run it as a service, ensure the created user is added to the docker group.

      sudo apt-get install -qq --yes \
          apt-transport-https \
          ca-certificates \
          curl \
          gnupg-agent \
          software-properties-common
      
      curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
      sudo apt-key fingerprint 0EBFCD88
      
      sudo add-apt-repository --yes \
         "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
         $(lsb_release -cs) \
         stable"
      
      sudo apt-get -qq update
      sudo apt install -qq --yes docker-ce docker-ce-cli containerd.io
      
      # Only add a group if it does not exist
      sudo getent group docker || sudo groupadd docker
      sudo usermod -aG docker $USERNAME_OF_ACCOUNT
      
      # Enable docker
      sudo systemctl enable docker
      
      sudo curl -L "https://github.com/docker/compose/releases/download/1.27.4/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
      sudo chmod +x /usr/local/bin/docker-compose
      sudo ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose
      docker-compose --version
      

      Disable root logins and any form of password-based authentication by altering sshd_config.

      sed -i '/^PermitRootLogin/s/yes/no/' /etc/ssh/sshd_config
      sed -i '/^PasswordAuthentication/s/yes/no/' /etc/ssh/sshd_config
      sed -i '/^ChallengeResponseAuthentication/s/yes/no/' /etc/ssh/sshd_config
      

      Configure the firewall and fail2ban.

      sudo ufw default deny incoming
      sudo ufw default allow outgoing
      sudo ufw allow ssh
      sudo ufw allow http
      sudo ufw allow https
      sudo ufw reload
      sudo ufw --force enable && sudo ufw status verbose
      
      sudo apt-get -qq install --yes fail2ban
      sudo systemctl enable fail2ban
      sudo systemctl start fail2ban
      

      Swapfiles.

      sudo fallocate -l 1G /swapfile && ls -lh /swapfile
      sudo chmod 0600 /swapfile && ls -lh /swapfile
      sudo mkswap /swapfile
      sudo swapon /swapfile && sudo swapon --show
      echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
      

      Unattended updates, and restart the ssh daemon.

      sudo apt install -qq unattended-upgrades
      sudo systemctl restart ssh
      

      Some questions

      You can assume these questions are cost-benefit focused, i.e. is it worth my time to investigate this, versus something else that may have better gains given my limited time.

      1. Obviously, any critiques of the above provisioning process are appreciated—both on the micro level of criticising particular lines, or zooming out and saying “well why don’t you do this instead…”. I can’t know what I don’t know.

      2. Is it worth investigating tools such as ss or lynis (https://github.com/CISOfy/lynis) to perform server auditing? I don’t have to meet any compliance requirements at this point.

      3. Do I get any meaningful increase in security by implementing 2FA on login here using google authenticator? As far as I can see, as long as I'm using best practices to actually ssh into our boxes, then the likeliest risk profile for unwanted access probably isn’t via the authentication mechanism I use personally to access my servers.

      4. Am I missing anything here? Beyond the provisioning script itself, I adhere to best practices around storing and generating passwords and ssh keys.

      Some notes and comments

      1. Eventually I'll use the hosting provider's API to spin up and spin down VPS's on the fly via a custom management application, which gives me an opportunity to programmatically execute the provisioning script above and run some over pre- and post-provisioning things, like deployment of the application and so forth.

      2. Usage alerts and monitoring is configured within DigitalOcean's console, and alerts are sent to our business' Slack for me to action as needed. Currently, I’m settling on the following alerts:
        1. Server CPU utilisation greater than 80% for 5 minutes.
        2. Server memory usage greater than 80% for 5 minutes.
        3. I’m also looking at setting up daily fail2ban status alerts if needed.
      9 votes
    5. Deploying containerized Docker instances in production?

      Hello! After spending many development hours in my past years running on Virtualbox/Vagrant-style setups, I've decided to take the plunge into learning Docker, and after getting a few containers...

      Hello! After spending many development hours in my past years running on Virtualbox/Vagrant-style setups, I've decided to take the plunge into learning Docker, and after getting a few containers working, I'm now looking to figure out how to deploy this to production. I'm not a DevOps or infrastructure guy, my bread and butter is software, and although I've become significantly better at deploying & provisioning Linux VPS's, I'm still not entirely confident in my ability to deploy & manage such systems at scale and in production. But, I am now close to running my own business, so these requirements are suddenly going from "nice to have" to "critical".

      As I mentioned, in the past when I've previously developed applications that have been pushed onto the web, I've tended to develop on my local machine, often with no specific configuration environment. If I did use an environment, it'd often be a Vagrant VM instance. From here, I'd push to GitHub, then from my VPS, pull down the changes, run any deployment scripts (recompile, restart nginx, etc), and I'm done.

      I guess what I'm after with Docker is something that's more consistent between dev, testing, & prod, and is also more hands off in the deployment process. Yet, what I'm currently developing still does have differing configuration needs between dev and prod. For example, I'd like to use a hosted DB solution such as DigitalOcean Managed Databases in production, yet I'm totally fine using a Docker container for MySQL for local development. Is something like this possible? Does anyone have any recommendations around how to accomplish this, any do's and dont's, or any catches that are worth mentioning?

      How about automating deployment from GitHub to production? I've never touched any CI/CD tools in my life, yet I know it's a hugely important part of the process when dealing with software in production, especially software that has clients dependent on it to function. Does anything specifically work well with Docker? Or GitHub? Ideally I want to be avoiding manual processes where I have to ssh in, and pull down the latest changes, half-remembering the commands I need to write to recompile and run the application again.

      10 votes
    6. Let's talk best-practice Jenkins on AWS ECS

      [seen on reddit but no discussion - if it's not okay to seek out better discussion here after seeing something fall flat on reddit, I am very sorry and I'll delete promptly] I've had some...

      [seen on reddit but no discussion - if it's not okay to seek out better discussion here after seeing something fall flat on reddit, I am very sorry and I'll delete promptly]

      I've had some experience in this realm for a while now, but I'm having a little trouble with one issue in particular. Before I divulge, I'll present my thoughts on best practice and and what I've been able to implement:

      • Terraform everything (in accordance to terragrunt's "style guide" i.e. organization)
        THIS IS A BIG ONE: for the jenkins master task, make sure to use the following args to make sure jenkins jobs aren't super slow as hell to start:
      -Djava.awt.headless=true -Dhudson.slaves.NodeProvisioner.initialDelay=0 -Dhudson.slaves.NodeProvisioner.MARGIN=50 -Dhudson.slaves.NodeProvisioner.MARGIN0=0.85
      

      THIS IS A GAME CHANGER (more-so on k8s clusters when the ecs plugin isn't used... hint, it's shit).

      • Create an EFS (in a separate terraform module) and mount it to the jenkins ECS cluster at /var/jenkins_home. Makes jenkins much more reliable through outages and easier to upgrade.
      • Run a logging agent (via docker container) like logspout or newrelic or whatever IN USER_DATA and not as a task - that way you get logs if there are issues during user_data/cloud_init... this I'm actually not sure about. Running a container outside the context of an ECS task means the ECS agent can't really track it and allocate mem/cpu properly... but it does help with user_data triage.
      • Use pipelines and git plugins to drive jobs. All jenkins jobs should be in source control!
      • Make sure you setup docker cleanup jobs on DAY 1! If you hace limited access to your cluster and you run out of disk due to docker cache, networks, volumes, etc... you're screwed till the admin ssh's in and runs a prune. Get a docker system prune going or the equivalent for each docker resource with appropriate filters... i.e. filter for anything older than a few days and is dangling.
      • Use Jenkins Global Libraries to make Jenkinsfiles cleaner (I always just use vars instead of groovy/java style packages because it's easier and less ugly)
        Jenkinsfiles should mostly call other bash files, make files, python scripts to generate and load prop files, etc. The less logic you put in a Jenkinsfile (which is just modified groovy) the better. String interpolation, among other things, is a fuckery that we don't have time to triage.
      • (out-of-scope) Move to using k8s/EKS instead of ECS asap because the ECS plugin for jenkins is absolute shit and it doesn't use priority correctly (sorry whoever developed it and... oh wait abandoned it and hasn't merged anything for years... for for real it's cool, just give admin to someone else).
      • (cultural) Stop calling them slaves. "Hey @eng, we're rotating slaves due to some cache issues. If you have been affected by race conditions in that past, our new update and slave rotation should fix that. Our update may have killed your job that was running on an old slave, just wait a few and the new slaves will be ready" <--This just doesn't look good.
        Hope that was some good stuff for you guys. Maybe I'm preaching to the choir, but I've seen some pretty shit jenkins setups.

      NOW FOR MY QUESTION!

      Has ANYONE actually been able to setup a proper jenkins user on ECS that actually works for both a master and ephemeral jenkins-agents so that they can mount and use the docker.sock for builds without hitting permission issues? I'm talking using the ecs plugin and mounting docker.sock via that.

      I have always resorted to running jenkins master and agents as root, which means you have to chmod files (super expensive time and cpu for services with tons of files). Running microservices as root is obviously bad practice, and chmod-ing a zilliion files is shit for docker cache and time... so I want to get jenkins users able to utilize the docker.sock. THIS IS SPECIFICALLY FOR THE AWS ECS AMI! I don't care about debian or old versions of docker where you could use DOCKER_OPTS. That doesn't work on the AWS Linux image.

      Thanks! And happy Friday!

      5 votes
    7. Be nice to your DevOps team by not using Maven or kitchen sink frameworks.

      Maven tries to be the kitchen sink in a lot of ways - rigid requirements to use plugins instead of scripts, trying to wrap your scm, and even act as a docker wrapper... this is insanely...

      Maven tries to be the kitchen sink in a lot of ways - rigid requirements to use plugins instead of scripts, trying to wrap your scm, and even act as a docker wrapper... this is insanely frustrating and an anti-patter for the rest of the software space. I would rather find a new job than work at a company that keeps pumping out maven and jhipster apps. It doesn't play nice with CI, it uses an insanely ugly configuration (xml) and most java developers don't even really know what they are doing when they are using it.

      Making a micro-service api? You don't need jhipster or maven or even java - there are so many other better alternatives. Need something simple? flask. Need something performant? go. And there are so many others in between that won't give you a NullPointerException, require you to download the entire internet just to serve some serialized json, or make your devops team hate you.

      Interested in hearing rebuttals and other peoples alts and overall preferences.

      5 votes