123 votes

Tildes will be down for most of this weekend

Tags: downtime

I'm going to be taking the site down this weekend to do some upgrades and changes to various systems it runs on. I'm planning to start the downtime somewhere around noon on Saturday, and have it running again on Sunday evening (vague times, in a vague North American timezone).

If you're interested in the details, the main reason is to switch operating systems from Ubuntu to Debian. The easiest and safest way to do this is by just setting up a new server and moving the site over, so I'll also be taking the opportunity to switch to a different physical server. Tildes has been running for 3 years now, so I'll be able to rent a new server that's some combination of faster and cheaper (not that we're getting anywhere close to the limits on this server, but I might as well).

Since I was having to review and adjust the whole server setup as part of switching OSes, I also decided to switch configuration-management systems from SaltStack to Ansible. Salt had a major vulnerability last year that compromised thousands of servers, and Tildes really only avoided being affected through the luck of using it in a non-vulnerable way. I've been intending to switch away from it ever since.

And since the site will be down anyway, I'll also be doing some upgrades and changes that are difficult to do without downtime, including upgrading PostgreSQL, Redis, and Python to their newest versions and making some changes to their setups.

So overall, this turned into a large project, and unfortunately it's one of those wonderful development projects where you do a ton of work and if everything goes well, nobody can even notice any difference from the way it was before. It also would have been possible to do all of this with only minimal downtime, but it would have required a lot more prep work and would be more stressful, so I'm just going to do it the easy way and take my time. If you're interested in doing a Screenless Day, maybe this could be a good time to do it!

29 comments

  1. Pistos
    Link
    Thanks for Tildes.

    Thanks for Tildes.

    32 votes
  2. aphoenix
    Link
    All great changes, lousy to have to spend a weekend on it. Good luck and godspeed.

    All great changes, lousy to have to spend a weekend on it. Good luck and godspeed.

    21 votes
  3. [8]
    Amarok
    Link
    I've spent more than a few Saturdays doing this very thing. Given your tech stack I doubt you'll run into any trouble - the benefits of simple, reliable, boring technology. ;) I know/love the...

    I've spent more than a few Saturdays doing this very thing. Given your tech stack I doubt you'll run into any trouble - the benefits of simple, reliable, boring technology. ;)

    I know/love the micro-downtime when you push updates that hardly lasts longer than a page refresh. Is this the first real downtime we're having? I can't seem to remember the last time the site was offline post-launch.

    16 votes
    1. [7]
      Deimos
      Link Parent
      It is! There was a short one (10 minutes, maybe?) when I did a PostgreSQL version update in October 2019, but I think other than that we only ever had one or two for a couple of minutes when I did...

      Is this the first real downtime we're having? I can't seem to remember the last time the site was offline post-launch.

      It is! There was a short one (10 minutes, maybe?) when I did a PostgreSQL version update in October 2019, but I think other than that we only ever had one or two for a couple of minutes when I did something dumb directly on the server.

      The site has never gone down "on its own" though. I set up a bunch of different monitoring to alert me if various things crashed or were going wrong, and none of it's ever gone off. I'm definitely not complaining about that, it's been nice to never need to worry about the site crashing when I'm unavailable.

      19 votes
      1. [3]
        Amarok
        Link Parent
        I'd wager you just casually beat most if not all social network uptime records for the first years of operation - not that anyone's really keeping score. :P

        I'd wager you just casually beat most if not all social network uptime records for the first years of operation - not that anyone's really keeping score. :P

        14 votes
        1. [2]
          Kuromantis
          Link Parent
          To be fair, most well known social media platforms came into the scene when demand (and thus the traffic and upkeep costs needed to keep up with it) were exponentially increasing, unlike tildes.

          To be fair, most well known social media platforms came into the scene when demand (and thus the traffic and upkeep costs needed to keep up with it) were exponentially increasing, unlike tildes.

          3 votes
          1. Amarok
            Link Parent
            Also to be equally fair, Tildes design is better than those networks. Most of their web pages are larger than the original Doom game was at this point. All Tildes has to do is pump a little text...

            Also to be equally fair, Tildes design is better than those networks. Most of their web pages are larger than the original Doom game was at this point. All Tildes has to do is pump a little text around. It should scale well enough on that alone, even the current server could likely handle hundreds of thousands of active users without breaking a sweat.

            Simplicity and efficiency will carry you a long way when it's time to scale.

            13 votes
      2. [3]
        teaearlgraycold
        Link Parent
        I'd be worried that it just doesn't work!

        and none of it's ever gone off

        I'd be worried that it just doesn't work!

        3 votes
        1. [2]
          whbboyd
          Link Parent
          The first thing to do after setting up monitoring is to forkbomb your server just to make sure it works. ;)

          The first thing to do after setting up monitoring is to forkbomb your server just to make sure it works. ;)

          5 votes
          1. Ember
            (edited )
            Link Parent
            Funny to hear forkbomb used in a useful context, takes me back to messing around with command prompt. Oh the days of hiding %0|%0 in a bat file and hoping someone would click... Now I want to set...

            Funny to hear forkbomb used in a useful context, takes me back to messing around with command prompt. Oh the days of hiding %0|%0 in a bat file and hoping someone would click... Now I want to set up better monitoring on my stuff so I can stress test it!

            2 votes
  4. hamstergeddon
    Link
    Thanks for the head's up and the details! I love me some devops chat :)

    Thanks for the head's up and the details! I love me some devops chat :)

    14 votes
  5. [6]
    Kuromantis
    (edited )
    Link
    Can you elaborate on the why of this change? You described why you'll move servers and ditch Salt and we know it's generally good to update to newer versions of anything like Python when given the...

    If you're interested in the details, the main reason is to switch operating systems from Ubuntu to Debian.

    Can you elaborate on the why of this change? You described why you'll move servers and ditch Salt and we know it's generally good to update to newer versions of anything like Python when given the chance, but not the OS change.

    12 votes
    1. Deimos
      Link Parent
      There isn't really a single overwhelming reason, but the version of Ubuntu I'm using is at its end-of-support date, so I would have had to do some kind of OS switch regardless. It could have just...

      There isn't really a single overwhelming reason, but the version of Ubuntu I'm using is at its end-of-support date, so I would have had to do some kind of OS switch regardless. It could have just been an upgrade to a newer Ubuntu, but those don't always go smoothly either.

      Ubuntu's based on Debian, and I overall have more trust in the work the Debian maintainers do to make sure everything's stable and secure on it. I was more familiar with Ubuntu, but I probably should have gone with Debian in the first place. I'm also a little uncomfortable with some of the directions I've seen Ubuntu moving in lately, including the push towards Snap packages.

      31 votes
    2. [5]
      Comment deleted by author
      Link Parent
      1. Deimos
        Link Parent
        I've read a little about Nix, but I don't really have a solid understanding of it at all. It has some great concepts, but it also seems really complex and unique. I'd probably like to tinker...

        I've read a little about Nix, but I don't really have a solid understanding of it at all. It has some great concepts, but it also seems really complex and unique.

        I'd probably like to tinker around with it on my own PC or maybe a VPS that hosts some unimportant stuff, but I'd be too scared of using it for something like Tildes. I'd be worried that I'd run into some kind of obscure problem that I have no idea how to fix myself and can't find any info online about. It's nice to do things in a boring way where it's always easy to find solutions.

        16 votes
      2. [2]
        streblo
        Link Parent
        Debian seems like a better choice if you just want something that works and stays out of the way. Using Ansible or Docker for your configuration management is going to be more straightforward than...

        Debian seems like a better choice if you just want something that works and stays out of the way. Using Ansible or Docker for your configuration management is going to be more straightforward than wrestling with a somewhat alien package management system. And you will wrestle with it at first.

        Don't get me wrong I think NixOS is pretty cool but unless you want to use it or you have a good reason to use it there are more straightforward options available.

        8 votes
        1. teaearlgraycold
          Link Parent
          I vastly prefer just paying Heroku to take care of it.

          I vastly prefer just paying Heroku to take care of it.

          1 vote
      3. teaearlgraycold
        Link Parent
        Copy-pasta aside, I think it's reasonable to define an operating system as any set of programs bundled together that alone are meant to provide an end user with some base functionality. So the...

        I'd just like to interject for a moment. What you're referring to as Linux, is in fact, GNU/Linux, or as I've recently taken to calling it, GNU plus Linux. Linux is not an operating system unto itself, but rather another free component of a fully functioning GNU system.

        Copy-pasta aside, I think it's reasonable to define an operating system as any set of programs bundled together that alone are meant to provide an end user with some base functionality. So the definition of an OS is really very muddy.

        1 vote
  6. Wes
    Link
    Aye aye, captain! Thanks for the heads up and good luck.

    Aye aye, captain! Thanks for the heads up and good luck.

    9 votes
  7. kfwyre
    Link
    I didn't actually get to do mine on the designated weekend, so this is a good push for me to actually fit it in during this upcoming one. Good luck with the upgrade! Thanks for everything you do...

    If you're interested in doing a Screenless Day, maybe this could be a good time to do it!

    I didn't actually get to do mine on the designated weekend, so this is a good push for me to actually fit it in during this upcoming one.

    Good luck with the upgrade! Thanks for everything you do for this place!

    7 votes
  8. ImmobileVoyager
    Link
    From my end, it looks like nothing happened at all. Mission acomplished.

    From my end, it looks like nothing happened at all. Mission acomplished.

    6 votes
  9. Muffin
    Link
    Thank you so much for the work you do!

    Thank you so much for the work you do!

    5 votes
  10. Jedi
    Link
    Good luck! See y’all on the other side.

    Good luck!
    See y’all on the other side.

    4 votes
  11. [4]
    UntouchedWagons
    Link
    Make sure you have backups first!

    Make sure you have backups first!

    4 votes
    1. [3]
      Amarok
      Link Parent
      Make sure your backups aren't write only first. I've had that happen a few times and it's never fun. :P

      Make sure your backups aren't write only first. I've had that happen a few times and it's never fun. :P

      6 votes
      1. [2]
        UntouchedWagons
        Link Parent
        How'd you manage that?

        How'd you manage that?

        3 votes
        1. Amarok
          Link Parent
          I inherited a datacenter full of about 60 machines that all had tape drive backups, running locally, to 60 tape drives, attached locally, and all using whatever rando backup software/system the...

          I inherited a datacenter full of about 60 machines that all had tape drive backups, running locally, to 60 tape drives, attached locally, and all using whatever rando backup software/system the developers of that particular project decided to use. Literally none of them were backing up what was required for a proper bare metal restore when I got them. :/

          Virtual machines are a blessing. I don't miss those shenanigans.

          7 votes
  12. mrbig
    Link
    Good luck, see you on the other side!

    Good luck, see you on the other side!

    4 votes
  13. [2]
    Kuromantis
    Link
    Can you give us a warning when you're about to shut the site down like say, 30 minutes before you do it?

    Can you give us a warning when you're about to shut the site down like say, 30 minutes before you do it?

    2 votes
    1. Deimos
      Link Parent
      About an hour from now.

      About an hour from now.

      9 votes
  14. Comment removed by site admin
    Link