41 votes

Google Cloud accidentally deletes UniSuper’s online account due to ‘unprecedented misconfiguration’

35 comments

  1. [11]
    scherlock
    Link
    Telling that the service is back up due to backups stored off Google Cloud.

    Telling that the service is back up due to backups stored off Google Cloud.

    27 votes
    1. [10]
      Omnicrola
      Link Parent
      I mean, that's just disaster recovery best practice. Never keep all your eggs in one basket, etc. In the days before cloud, you never kept backups in the same physical building because in the...

      I mean, that's just disaster recovery best practice. Never keep all your eggs in one basket, etc. In the days before cloud, you never kept backups in the same physical building because in the unlikely chance the building burned down your company was 100% screwed.

      33 votes
      1. [8]
        Nazarie
        Link Parent
        We would keep one set of backups in the DC and one set was picked up and stored in a vault in a remote location on a daily basis, both duplicate drives and tape archives.

        We would keep one set of backups in the DC and one set was picked up and stored in a vault in a remote location on a daily basis, both duplicate drives and tape archives.

        11 votes
        1. [4]
          cfabbro
          (edited )
          Link Parent
          Just make sure you include regular backup integrity checks into your routine... we made a lot of money from people that failed to do that, and only found out after a critical data loss that their...

          Just make sure you include regular backup integrity checks into your routine... we made a lot of money from people that failed to do that, and only found out after a critical data loss that their backups weren't actually functional. /former data recovery tech ;)

          24 votes
          1. overbyte
            Link Parent
            We had automated rebuilds of database instances for that reason. It's paid off hard in catching a lot of gotchas before we actually needed the backups, like sorting out the order of restoring and...

            We had automated rebuilds of database instances for that reason.

            It's paid off hard in catching a lot of gotchas before we actually needed the backups, like sorting out the order of restoring and loading encryption keys so the backups could be decrypted, or pinning down pymongo to specific versions so we can rebuild a replicaset of an ancient version of MongoDB (3.x) that we need to keep running until the data has been migrated off.

            6 votes
          2. [2]
            Nazarie
            Link Parent
            We had monthly verification. It was a military installation and DR was a critical aspect of our operations. Not a normal DR though as it included attack resilient engineering, recovery from EMP,...

            We had monthly verification. It was a military installation and DR was a critical aspect of our operations. Not a normal DR though as it included attack resilient engineering, recovery from EMP, and other fun things.

            5 votes
            1. cfabbro
              Link Parent
              Oh damn. Super cool! That sounds absolutely fascinating! If you even can (assuming it isn't classified), I would love to hear about some of the stuff you worked on. EMP recovery especially sounds...

              Oh damn. Super cool! That sounds absolutely fascinating! If you even can (assuming it isn't classified), I would love to hear about some of the stuff you worked on. EMP recovery especially sounds really interesting!

              4 votes
        2. [2]
          roo1ster
          Link Parent
          I worked at a dotcom 15+ years ago that had been off siting their backups to a safe deposit box at the bank a block away. That was great until the 1st time they needed the data was on a Friday...

          I worked at a dotcom 15+ years ago that had been off siting their backups to a safe deposit box at the bank a block away. That was great until the 1st time they needed the data was on a Friday evening... still better than no off site at all though.

          7 votes
          1. Nazarie
            Link Parent
            Ours were kept in a secure vault we had access to, 24/7. But yeah, I can see how something so simple would be overlooked. Back then I wasn't the one building the plan, just implementing it on an...

            Ours were kept in a secure vault we had access to, 24/7. But yeah, I can see how something so simple would be overlooked. Back then I wasn't the one building the plan, just implementing it on an ongoing basis.

            5 votes
        3. Carighan
          Link Parent
          Yeah same. Live system + online/hot backup + regular cold backup for most systems. Extra warm backup on another data center if needed.

          Yeah same. Live system + online/hot backup + regular cold backup for most systems. Extra warm backup on another data center if needed.

          2 votes
      2. scherlock
        Link Parent
        My point was that GCP obviously hard deleted everything. Many providers will soft delete, then hard delete after x days when it's the provider doing the deletion.

        My point was that GCP obviously hard deleted everything. Many providers will soft delete, then hard delete after x days when it's the provider doing the deletion.

        8 votes
  2. [3]
    norb
    Link
    Wonder how many first line support reps they had to go through before getting to someone that could 1) figure out it was not an end user problem, 2) admit it was Google's problem and, 3)...

    Wonder how many first line support reps they had to go through before getting to someone that could 1) figure out it was not an end user problem, 2) admit it was Google's problem and, 3) coordinate a restore from a 3rd party?

    11 votes
    1. Greg
      Link Parent
      They’ll definitely have a named account manager and customer engineer, so I imagine it got picked up pretty quickly and then immediately turned into a very bad day for everyone in the chain it got...

      They’ll definitely have a named account manager and customer engineer, so I imagine it got picked up pretty quickly and then immediately turned into a very bad day for everyone in the chain it got passed up to. That tipping point where your organisation spends enough for interactions with a provider to flip from faceless ticketing system to actual specific human with a face and a direct email address and phone number is a game changer.

      For me, that makes it all the worse being stuck in the lower circles of first line support hell the rest of the time: there’s incontrovertible proof that these companies can do support properly, they just either don’t care about average customers at all or they actively want to make the support experience frustrating enough that we’ll go away and stop bothering them.

      15 votes
    2. TurtleCracker
      Link Parent
      I would be concerned how many small businesses this exact same thing happened to with no recourse. Some of these big companies are nearly impossible to contact for support.

      I would be concerned how many small businesses this exact same thing happened to with no recourse. Some of these big companies are nearly impossible to contact for support.

      12 votes
  3. [3]
    MimicSquid
    Link
    Can someone with more tech background than I explain what an "an inadvertent misconfiguration during provisioning of UniSuper’s Private Cloud services" actually IS?

    Can someone with more tech background than I explain what an "an inadvertent misconfiguration during provisioning of UniSuper’s Private Cloud services" actually IS?

    5 votes
    1. cfabbro
      (edited )
      Link Parent
      Until we see a postmortem we can't know for sure exactly what caused it, and I doubt any speculation will be entirely accurate since it was supposedly due to "a combination of rare issues" and...

      Until we see a postmortem we can't know for sure exactly what caused it, and I doubt any speculation will be entirely accurate since it was supposedly due to "a combination of rare issues" and "one-of-a-kind occurrence". If I had to guess though, it very likely has something to do with this power failover event in the australia-southeast1 region on May 8: https://status.cloud.google.com/incidents/5feV12qHeQoD3VdD8byK

      Multiple Google Cloud products experienced service disruptions of varying impact and duration, with the longest lasting being 2 hours and 55 minutes in the australia-southeast1 region. From preliminary analysis, the root cause of this incident is currently believed to be an unplanned power event caused by a power failover due to a utility company outage. Google will complete a full Incident Report in the following days that will provide a detailed root cause.

      Customer Impact:

      • Persistent Disk - impacted users experienced slow or unavailable devices.
      • Google Cloud Dataflow - impacted users experienced an increase in streaming jobs with watermarks in australia-southeast1-a zone for a duration of 30 minutes.
      • Google Cloud Pub/Sub - users experienced an increased error rate for “Publish requests” for a duration of about 35 minutes.
      • Google Big Query - impacted users experienced failures for BigQuery jobs in the australia-southeast1 region.
      • Google Compute Engine - impacted VMs went into repair mode for about 45 minutes.
      • Cloud Filestore - multiple Filestore instances in australia-southeast1-a were unavailable and had missing metrics for a duration of 2 hours 55 minutes, with the last impacted instance confirmed to have recovered at 21:43 PT.
      • Virtual Private Cloud (VPC) - the impacted users experienced packet loss, unavailability of existing VMs and delays while creating new VMs.
      • Cloud SQL - impacted users experienced errors when accessing their Cloud SQL database instances in the australia-southeast1-a zone.
      • Cloud Logging - Cloud Logging experienced a minor increase in ingestion error in australia-southeast1 for a duration of 15 minutes.
      • Cloud Bigtable - users experienced a high error rate in the impacted region for a duration of about 25 minutes.
      • Cloud Apigee - impacted users received 5XX and 2XX error for a duration of 30 minutes.

      Additional details:
      After service mitigation and full closure of the incident, there was continued Persistent Disk impact for a narrowed group of customers identified. This has since been resolved with no further isolated impact.

      15 votes
    2. em-dash
      Link Parent
      inadvertent misconfiguration: a thing was accidentally set up subtly wrong provisioning: initial setup of new servers private cloud services: normal cloud services (i.e. they run their stuff on...

      inadvertent misconfiguration: a thing was accidentally set up subtly wrong

      provisioning: initial setup of new servers

      private cloud services: normal cloud services (i.e. they run their stuff on computers provided by Google), but separated by firewalls from other companies' cloud things

      That's still uselessly vague, of course. I don't think anyone's given more specific details on what actually happened.

      9 votes
  4. first-must-burn
    Link
    PSA: Synology NAS has backup apps for all the cloud storage providers, including a "no delete" option. You add your credentials and it keeps everything synced locally. Obviously the Unisuper thing...

    PSA: Synology NAS has backup apps for all the cloud storage providers, including a "no delete" option. You add your credentials and it keeps everything synced locally.

    Obviously the Unisuper thing is far beyond that in scope, but for personal users, there are some good options to protect your cloud data.

    4 votes
  5. [13]
    ebonGavia
    Link
    Incredible. Google has really lost the plot. I'm sure Pichai will use this as an opportunity to cut thousands more jobs. I really can't understand how anyone can trust them with anything anymore....

    Incredible. Google has really lost the plot. I'm sure Pichai will use this as an opportunity to cut thousands more jobs.

    I really can't understand how anyone can trust them with anything anymore. I will never use Apple products so I'm kind of at a loss. Got to install GrapheneOS.

    12 votes
    1. [2]
      skybrian
      Link Parent
      This seems like an overreaction. We have only a vague idea of how it happened. Hopefully a postmortem will be published, and then we can talk about what they should have done. Compare with how...

      This seems like an overreaction. We have only a vague idea of how it happened. Hopefully a postmortem will be published, and then we can talk about what they should have done.

      Compare with how crashes are handled in the airline industry. It seems that at least nobody died.

      I would like to see easier ways for ordinary consumers to automatically back up Google-hosted data, though.

      22 votes
      1. MimicSquid
        Link Parent
        One of my clients has a Synology home server that comes with a tool for automatic backup from Google Drive. It didn't require any significant tech skills to set up, though it's totally possible...

        One of my clients has a Synology home server that comes with a tool for automatic backup from Google Drive. It didn't require any significant tech skills to set up, though it's totally possible that my sense of what is a significant level of tech skill is broken.

        3 votes
    2. [9]
      teaearlgraycold
      Link Parent
      Why have you blacklisted Apple products?

      Why have you blacklisted Apple products?

      3 votes
      1. [8]
        Acorn_CK
        Link Parent
        Not OP, but personally: because fuck the walled garden business strategy. Also the Apple name surcharge, although that isn't as bad nowadays (I think?).

        Not OP, but personally: because fuck the walled garden business strategy. Also the Apple name surcharge, although that isn't as bad nowadays (I think?).

        9 votes
        1. [7]
          teaearlgraycold
          Link Parent
          Regarding the surcharge, at least for non-budget laptops they're probably the best value out there. For phones they're ridiculous - you can spend over $1000 on an iPhone 15 and still be locked to...

          Regarding the surcharge, at least for non-budget laptops they're probably the best value out there. For phones they're ridiculous - you can spend over $1000 on an iPhone 15 and still be locked to a 60 Hz screen!

          10 votes
          1. [6]
            sleepydave
            Link Parent
            I find it difficult to see macbooks as "value" devices when they're peddling non-upgradable 8GB models for a base price of 1000 USD. Apple understands their target demographics are not tech-savvy...

            I find it difficult to see macbooks as "value" devices when they're peddling non-upgradable 8GB models for a base price of 1000 USD. Apple understands their target demographics are not tech-savvy people, and banking on the fact that the vast majority won't ever realise the blatant price gouging Apple has been inflicting upon their customers.

            2 votes
            1. [5]
              teaearlgraycold
              Link Parent
              I don't know what kind of sales tactics they use when people ask about the different models. Depending on the instructions for their sales-people my feelings on the 8GB models could vary. To be...

              I don't know what kind of sales tactics they use when people ask about the different models. Depending on the instructions for their sales-people my feelings on the 8GB models could vary. To be honest, I do believe that a large chunk of Macbook customers just want an iPad with a keyboard. The newest iPad Pro has 8GB of RAM (more if you get the 1TB+ storage models) and no one would question whether that's an appropriate amount of memory.

              For normal computer use cases I agree 8GB of RAM isn't enough and I'd rather see the base model come with 16. The non-upgrade-able part plays into their profits, but it's also not really possible for them to sell that hardware with upgrade-able RAM. They're all on the same motherboard and while the 8GB model has a similar amount of memory bandwidth as a nice x86 computer, the higher-end models can have as much as 400GB/s of memory bandwidth. You won't get that on SODIMMs. And then there's the (IMO less important) matter of the device's size which couldn't be maintained with RAM sockets.

              6 votes
              1. [4]
                sleepydave
                Link Parent
                Don't get me wrong, I'm not saying macbooks are unusable - I used to daily drive one for about 6 years. I think Apple's greatest accomplishment to this day has been making the first palatable Unix...

                Don't get me wrong, I'm not saying macbooks are unusable - I used to daily drive one for about 6 years. I think Apple's greatest accomplishment to this day has been making the first palatable Unix OS for the average end user. I'm just fed up with seeing the anti-consumer business practices, including but not limited to, the obnoxious memory pricing and the "just buy your mom an iPhone" mantra.

                6 votes
                1. [3]
                  teaearlgraycold
                  Link Parent
                  Yeah I’m not trying to fanboy here. But they’ve absolutely got the laptop market cornered. Except for running PC games I can’t think of any price point above $900 at which I’d recommend anything...

                  Yeah I’m not trying to fanboy here. But they’ve absolutely got the laptop market cornered. Except for running PC games I can’t think of any price point above $900 at which I’d recommend anything but a Macbook. Basically, they need better competition to wipe off their smug smiles. And I’d love to see that happen.

                  6 votes
                  1. [2]
                    MimicSquid
                    Link Parent
                    But why do people need $900+ of laptop? It seems like at that point a workstation is more useful most of the time, and a lighter weight and cheaper portable option for the times when you're...

                    But why do people need $900+ of laptop? It seems like at that point a workstation is more useful most of the time, and a lighter weight and cheaper portable option for the times when you're travelling. Not to say that there aren't people who need a heavyweight laptop, but is it really that many?

                    3 votes
                    1. teaearlgraycold
                      Link Parent
                      I think you’re right regarding performance. The step-function improvement with battery life and build quality by spending enough for a Macbook makes it easier to justify. But not necessary by any...

                      I think you’re right regarding performance. The step-function improvement with battery life and build quality by spending enough for a Macbook makes it easier to justify. But not necessary by any means and I certainly enjoyed my tricked out T430 back in college.

                      I hope that eventually the battery life of Apple’s laptops trickles down to the lower end of the market.

                      1 vote
    3. llehsadam
      Link Parent
      You can’t really trust any of these services, but this doesn’t mean you have to give up the ease-of-use they offer. For the free stuff, just make sure to use more than one service provider, for...

      You can’t really trust any of these services, but this doesn’t mean you have to give up the ease-of-use they offer. For the free stuff, just make sure to use more than one service provider, for paid things, keep a local backup.

      Same thing for Apple, why not use the walled garden if it’s possible to keep the garden gate to some extent open?

      3 votes
  6. lou
    Link
    Hacker News seems oddly quiet about this.

    Hacker News seems oddly quiet about this.

    2 votes